Chat Completions
POST /v1/chat/completions
Generate a model response for a conversation. Compatible with the OpenAI Chat Completions API.
Headers
| Header | Required | Description |
|---|---|---|
Authorization |
Yes | Bearer <api-key-or-jwt> |
Content-Type |
Yes | application/json |
X-Quantized-Provider |
No | Force a specific provider (openrouter, anthropic, bedrock) |
Request body
Required fields
| Field | Type | Description |
|---|---|---|
model |
string | Model identifier (e.g., openai/gpt-4.1-mini) |
messages |
array | List of conversation messages |
Generation parameters
| Field | Type | Default | Description |
|---|---|---|---|
max_tokens |
integer | null | Maximum tokens in the completion (minimum: 1) |
max_completion_tokens |
integer | null | Alternative to max_tokens (minimum: 1) |
temperature |
float | null | Sampling temperature (0–2). Lower is more deterministic |
top_p |
float | null | Nucleus sampling threshold (0–1) |
frequency_penalty |
float | null | Penalize tokens by frequency (−2 to 2) |
presence_penalty |
float | null | Penalize tokens by presence (−2 to 2) |
repetition_penalty |
float | null | Repetition penalty factor (0–2). OpenRouter-specific |
stop |
string or array | null | Stop sequence(s) |
seed |
integer | null | Seed for deterministic generation (best effort) |
Output control
| Field | Type | Default | Description |
|---|---|---|---|
response_format |
object | null | Output format: {"type": "json_object"} or {"type": "json_schema", "json_schema": {...}} |
When response_format is json_object or json_schema, the router guarantees the choices[].message.content value is directly JSON.parse-able. Some models (notably Claude Haiku 4.5 when routed through Anthropic) wrap JSON output in markdown fences — the router strips those on your behalf so you don’t have to. See the Anthropic provider notes for details.
Tool calling
| Field | Type | Default | Description |
|---|---|---|---|
tools |
array | null | Tool/function definitions. See Tool format below |
tool_choice |
string or object | null | "auto", "none", "required", or {"type": "function", "function": {"name": "..."}} |
parallel_tool_calls |
boolean | null | Allow parallel tool execution |
Reasoning
| Field | Type | Default | Description |
|---|---|---|---|
reasoning |
object | null | Reasoning config for thinking models. effort: "none", "low", "medium", or "high". Optional exclude (boolean) controls whether reasoning content is included in the response. Example: {"effort": "low", "exclude": false} |
Streaming
| Field | Type | Default | Description |
|---|---|---|---|
stream |
boolean | false | Enable SSE streaming |
stream_options |
object | null | Streaming configuration, e.g. {"include_usage": true}. Usage is always included in the final SSE chunk regardless of this setting |
Advanced
| Field | Type | Default | Description |
|---|---|---|---|
logprobs |
boolean | null | Enable token log probabilities in the response |
top_logprobs |
integer | null | Number of most likely tokens to return per position (0-20). Requires logprobs: true |
logit_bias |
object | null | Map of token IDs to bias values (-100 to 100). Adjusts likelihood of specific tokens appearing in the output |
user |
string | null | End-user identifier for tracking and abuse detection |
The API uses strict parameter validation. Any field not listed above will be rejected with a 422 error. Parameters like top_k, modalities, audio, web_search_options, and metadata are not currently supported.
Messages
Each message must be an object with role (required) and content. Plain strings are not accepted.
Message fields
| Field | Type | Required | Description |
|---|---|---|---|
role |
string | Yes | One of: system, user, assistant, tool |
content |
string, array, or null | No | Text content, content parts (for vision), or null (for tool call messages) |
name |
string | No | Participant name (for multi-user conversations) |
tool_calls |
array | No | Tool calls made by the assistant (in assistant messages) |
tool_call_id |
string | No | Links a tool response to its call (required for tool messages) |
refusal |
string or null | No | Model refusal text. Only valid in assistant messages |
reasoning |
string or null | No | Reasoning text from thinking models. Only valid in assistant messages |
reasoning_details |
array or null | No | Detailed reasoning steps. Only valid in assistant messages |
{"role": "user", "content": "What is 2+2?"}
Roles
| Role | Description |
|---|---|
system |
Sets the model’s behavior and context |
user |
The user’s input |
assistant |
The model’s previous response (for multi-turn) |
tool |
Response from a tool call (must include tool_call_id) |
Text messages
[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
Multi-turn conversations
[
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "What about Germany?"}
]
Multimodal content parts
To send anything other than plain text, set content to an array of content parts. Each part declares a type and a type-specific payload. The following part types are accepted:
| Content type | Modality | Shape |
|---|---|---|
text |
text | {"type": "text", "text": "..."} |
image_url |
image | {"type": "image_url", "image_url": {"url": "..."}} |
input_audio |
audio | {"type": "input_audio", "input_audio": {"data": "<base64>", "format": "mp3"}} |
video_url |
video | {"type": "video_url", "video_url": {"url": "..."}} |
file |
document | {"type": "file", "file": {"filename": "...", "file_data": "..."}} |
Any other type value is rejected with a 422 error.
Image
Image URLs can be an HTTPS URL or a base64 data URI.
[
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]
}
]
https://example.com/image.png
data:image/jpeg;base64,/9j/4AAQ...
Audio
input_audio.data must be a raw base64 string (no data-URI prefix). input_audio.format is one of wav, mp3, aiff, aac, ogg, flac, m4a, pcm16, pcm24.
[
{
"role": "user",
"content": [
{"type": "text", "text": "Transcribe and summarize this audio."},
{"type": "input_audio", "input_audio": {"data": "<base64-mp3>", "format": "mp3"}}
]
}
]
Video
video_url.url can be an HTTPS URL or a base64 data URI with a video MIME type (video/mp4, video/mpeg, video/mov, video/webm).
[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe what is happening in this video."},
{"type": "video_url", "video_url": {"url": "data:video/mp4;base64,<base64>"}}
]
}
]
Document (PDF)
file.file_data can be an HTTPS URL or a base64 data URI with application/pdf. Either file_data or file_id is required.
[
{
"role": "user",
"content": [
{"type": "text", "text": "Summarize this document."},
{
"type": "file",
"file": {
"filename": "report.pdf",
"file_data": "data:application/pdf;base64,<base64>"
}
}
]
}
]
Model requirements
The model you target must declare support for each non-text modality you send. If a request contains audio parts but the chosen model’s input_modality.audio is false, the request is rejected with a 400:
{"error": {"message": "Model 'openai/gpt-4.1-nano' does not support audio input"}}
See the Providers capability matrix for per-provider support, and the Models endpoint for per-model modality flags. PDFs (file) are routed through OpenRouter’s universal PDF parser and do not require a dedicated modality flag on the model.
Tool calls (multi-turn)
When the model calls a tool, continue the conversation by including the assistant’s tool call and your tool’s response:
[
{"role": "user", "content": "What's the weather in Paris?"},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_1",
"type": "function",
"function": {"name": "get_weather", "arguments": "{\"city\": \"Paris\"}"}
}
]
},
{"role": "tool", "tool_call_id": "call_1", "content": "{\"temp\": 18, \"unit\": \"C\"}"}
]
Tool format
Define tools using the standard function calling format. This format is the same regardless of which provider handles the request — the provider layer converts it automatically.
{
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "The city name"}
},
"required": ["city"]
}
}
}
],
"tool_choice": "auto"
}
Examples
curl -X POST https://api.quantized.us/v1/chat/completions \
-H "Authorization: Bearer sk-quantized-YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4.1-mini",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"max_tokens": 128,
"temperature": 0.7
}'
import httpx
response = httpx.post(
"https://api.quantized.us/v1/chat/completions",
headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
json={
"model": "openai/gpt-4.1-mini",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
],
"max_tokens": 128,
"temperature": 0.7,
},
)
data = response.json()
print(data["choices"][0]["message"]["content"])
from openai import OpenAI
client = OpenAI(
api_key="sk-quantized-YOUR-KEY",
base_url="https://api.quantized.us/v1",
)
response = client.chat.completions.create(
model="openai/gpt-4.1-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
],
max_tokens=128,
temperature=0.7,
)
print(response.choices[0].message.content)
Response
{
"id": "gen-abc123",
"object": "chat.completion",
"model": "openai/gpt-4.1-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris.",
"refusal": null
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 8,
"total_tokens": 33,
"credits_used": 2400,
"credits_remaining": 997600,
"prompt_tokens_details": {
"cached_tokens": 0,
"cache_write_tokens": 0,
"audio_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 0,
"audio_tokens": 0
}
},
"created": 1719000000
}
Response fields
| Field | Type | Description |
|---|---|---|
id |
string | Unique completion ID |
object |
string | Always "chat.completion" |
model |
string | Model that generated the response |
created |
integer | Unix timestamp |
choices |
array | List of completion choices |
choices[].index |
integer | Choice index |
choices[].message.role |
string | Always "assistant" |
choices[].message.content |
string or null | The generated text (null when tool_calls present) |
choices[].message.refusal |
string or null | Model’s refusal message if it declined to answer |
choices[].message.tool_calls |
array or null | Tool calls made by the model (present when model calls tools) |
choices[].message.reasoning |
string or null | Reasoning text from thinking models (present when reasoning is enabled) |
choices[].message.reasoning_details |
array or null | Detailed reasoning steps (present when reasoning is enabled) |
choices[].finish_reason |
string | "stop", "length", or "tool_calls" |
choices[].logprobs |
object or null | Token log probabilities (present when top_logprobs is set in the request) |
usage.prompt_tokens |
integer | Input tokens |
usage.completion_tokens |
integer | Output tokens |
usage.total_tokens |
integer | Total tokens |
usage.credits_used |
integer | Micro-credits consumed |
usage.credits_remaining |
integer or null | Micro-credits remaining (null if unlimited) |
usage.prompt_tokens_details |
object or null | Token breakdown: cached_tokens, cache_write_tokens, audio_tokens |
usage.completion_tokens_details |
object or null | Token breakdown: reasoning_tokens, audio_tokens |
Streaming
Set "stream": true to receive Server-Sent Events. See the Streaming guide for details and code examples.
Errors
| Status | Condition |
|---|---|
400 |
Invalid request (missing model, bad field types) |
401 |
Invalid or missing API key |
402 |
Insufficient credits |
404 |
Model not found |
422 |
Unsupported parameter or invalid field structure |
503 |
Provider unavailable |