Streaming
Both the Chat Completions and Responses endpoints support streaming via Server-Sent Events (SSE). Streaming delivers tokens incrementally as the model generates them, instead of waiting for the full response.
Enabling streaming
Set stream: true in the request body:
curl -X POST https://api.quantized.us/v1/chat/completions \
-H "Authorization: Bearer sk-quantized-YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4.1-mini",
"messages": [{"role": "user", "content": "Write a haiku about code."}],
"stream": true
}'
The response uses Content-Type: text/event-stream with the following headers:
Content-Type: text/event-stream; charset=utf-8
Cache-Control: no-cache
Connection: keep-alive
X-Accel-Buffering: no
Chat Completions stream format
Each event is a data: line containing a JSON object. The stream ends with data: [DONE].
data: {"id":"gen-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"gen-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Lines"},"finish_reason":null}]}
data: {"id":"gen-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" of"},"finish_reason":null}]}
data: {"id":"gen-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" code"},"finish_reason":null}]}
data: {"id":"gen-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: {"id":"gen-abc","object":"chat.completion.chunk","usage":{"prompt_tokens":14,"completion_tokens":12,"total_tokens":26,"credits_used":200,"credits_remaining":999800}}
data: [DONE]
Some providers also send a usage-only chunk where choices is an empty array ([]) before [DONE]. Client code must not assume choices[0] exists on every SSE event.
The final chunk before [DONE] often contains the usage object with credit information (sometimes together with empty choices).
Responses stream format
The Responses endpoint uses typed events with event: and data: lines:
event: response.created
data: {"id":"resp-abc","object":"response","status":"in_progress","model":"openai/gpt-4.1-mini","output":[]}
event: response.in_progress
data: {"id":"resp-abc","object":"response","status":"in_progress"}
event: response.output_item.added
data: {"type":"message","id":"msg-001","role":"assistant","content":[]}
event: response.content_part.added
data: {"type":"output_text","text":""}
event: response.output_text.delta
data: {"type":"output_text","delta":"Lines"}
event: response.output_text.delta
data: {"type":"output_text","delta":" of code"}
event: response.output_text.done
data: {"type":"output_text","text":"Lines of code flow..."}
event: response.content_part.done
data: {"type":"output_text","text":"Lines of code flow..."}
event: response.output_item.done
data: {"type":"message","id":"msg-001","role":"assistant","content":[...]}
event: response.completed
data: {"id":"resp-abc","object":"response","status":"completed","output":[...],"usage":{"input_tokens":14,"output_tokens":12,"credits_used":200,"credits_remaining":999800}}
data: [DONE]
Each event includes a sequence_number that increases monotonically.
Consuming streams
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
api_key="sk-quantized-YOUR-KEY",
base_url="https://api.quantized.us/v1",
)
stream = client.chat.completions.create(
model="openai/gpt-4.1-mini",
messages=[{"role": "user", "content": "Write a haiku about code."}],
stream=True,
)
for chunk in stream:
# Usage-only chunks often have `choices: []`; never index `[0]` blindly.
if not chunk.choices:
continue
delta = chunk.choices[0].delta
if delta and delta.content:
print(delta.content, end="", flush=True)
print()
Python (httpx, raw SSE)
import httpx
with httpx.stream(
"POST",
"https://api.quantized.us/v1/chat/completions",
headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
json={
"model": "openai/gpt-4.1-mini",
"messages": [{"role": "user", "content": "Write a haiku about code."}],
"stream": True,
},
) as response:
for line in response.iter_lines():
if line.startswith("data: ") and line != "data: [DONE]":
import json
chunk = json.loads(line[6:])
choices = chunk.get("choices") or []
if not choices:
continue
delta = choices[0].get("delta") or {}
content = delta.get("content") or ""
if content:
print(content, end="", flush=True)
print()
JavaScript
const response = await fetch('https://api.quantized.us/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer sk-quantized-YOUR-KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'openai/gpt-4.1-mini',
messages: [{ role: 'user', content: 'Write a haiku about code.' }],
stream: true,
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
// Parse SSE lines and extract content deltas
console.log(text);
}