Responses

POST /v1/responses

Generate a model response using the Responses API format. Supports instructions, multi-turn via previous_response_id, tools, and streaming.

Headers

Header Required Description
Authorization Yes Bearer <api-key-or-jwt>
Content-Type Yes application/json
X-Quantized-Provider No Force a specific provider (default: openrouter)

Request body

Required fields

Field Type Description
model string Model identifier (e.g., openai/gpt-4.1-mini)
input string or array The input text or a list of input items

Optional fields

Field Type Default Description
instructions string null System-level instructions for the model
max_output_tokens integer null Maximum tokens in the response
temperature float null Sampling temperature (0–2)
top_p float null Nucleus sampling threshold
top_k float null Top-k sampling
frequency_penalty float null Frequency penalty (−2.0 to 2.0)
presence_penalty float null Presence penalty (−2.0 to 2.0)
previous_response_id string null ID of a previous response for multi-turn
tools array null Tool/function definitions
tool_choice any null Tool selection strategy
parallel_tool_calls boolean null Allow parallel tool execution
store boolean null Store the response for later retrieval
stream boolean false Enable SSE streaming
metadata object null Key-value metadata
user string null User identifier
reasoning object null Reasoning configuration
text object null Text output configuration
truncation string null Truncation strategy
include array null Additional data to include in the response
service_tier string null Service tier preference
background boolean null Run in background

Input format

The input field accepts either a plain string or an array of input items:

// Simple string
{"input": "What is the capital of France?"}

// Array of items
{"input": [
  {"role": "user", "content": "What is the capital of France?"}
]}

Examples

cURL
Python
OpenAI SDK
curl -X POST https://api.quantized.us/v1/responses \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4.1-mini",
    "input": "What is the capital of France?",
    "instructions": "You are a geography expert. Be concise.",
    "max_output_tokens": 128
  }'
import httpx

response = httpx.post(
    "https://api.quantized.us/v1/responses",
    headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
    json={
        "model": "openai/gpt-4.1-mini",
        "input": "What is the capital of France?",
        "instructions": "You are a geography expert. Be concise.",
        "max_output_tokens": 128,
    },
)
data = response.json()
print(data["output"][0]["content"][0]["text"])
from openai import OpenAI

client = OpenAI(
    api_key="sk-quantized-YOUR-KEY",
    base_url="https://api.quantized.us/v1",
)

response = client.responses.create(
    model="openai/gpt-4.1-mini",
    input="What is the capital of France?",
    instructions="You are a geography expert. Be concise.",
    max_output_tokens=128,
)
print(response.output[0].content[0].text)

Response

{
  "id": "resp-abc123",
  "object": "response",
  "status": "completed",
  "model": "openai/gpt-4.1-mini",
  "output": [
    {
      "id": "msg-001",
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "The capital of France is Paris."
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 20,
    "output_tokens": 8,
    "total_tokens": 28,
    "credits_used": 2000,
    "credits_remaining": 998000,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens_details": {
      "reasoning_tokens": 0
    }
  },
  "created_at": 1719000000
}

Response fields

Field Type Description
id string Unique response ID
object string Always "response"
status string "completed", "in_progress", "failed"
model string Model that generated the response
output array List of output items
output[].type string "message" or "function_call"
output[].role string "assistant" for message items
output[].content array Content parts
output[].content[].type string "output_text"
output[].content[].text string The generated text
usage.input_tokens integer Input tokens
usage.output_tokens integer Output tokens
usage.total_tokens integer Total tokens
usage.credits_used integer Micro-credits consumed
usage.credits_remaining integer Micro-credits remaining
created_at integer or string Creation timestamp

Streaming

Set "stream": true to receive Server-Sent Events. The Responses endpoint uses typed events (response.created, response.output_text.delta, response.completed, etc.). See the Streaming guide for the full event format and code examples.

Errors

Status Condition
400 Invalid request
401 Invalid or missing API key
402 Insufficient credits
404 Model not found
503 Provider unavailable