API Reference
Embeddings

Embeddings

POST /v1/embeddings

Generate a vector embedding (or batch of embeddings) for one or more text inputs. Compatible with the OpenAI Embeddings API.

Headers

Header	Required	Description
`Authorization`	Yes	`Bearer <api-key-or-jwt>`
`Content-Type`	Yes	`application/json`
`X-Quantized-Provider`	No	Force a specific provider (`openai`, `openrouter`)

Request body

Field	Type	Required	Default	Description
`model`	string	Yes	—	Model identifier (e.g. `text-embedding-3-small`)
`input`	string or array of strings	Yes	—	Text(s) to embed. A single string returns one vector; a list returns one vector per element
`dimensions`	integer	No	null	Truncate output vector dimensionality. Supported on `text-embedding-3-small` and `text-embedding-3-large`. Must be `>= 1`
`encoding_format`	string	No	`"float"`	Output encoding. v1 only accepts `"float"`
`user`	string	No	null	End-user identifier forwarded to the provider for abuse monitoring

Strict validation

The serializer uses extra="forbid" — any field not listed above is rejected with 422. This keeps typos from being silently dropped. Notable fields not accepted in v1: input_type, output_dtype, token-array input, multimodal input.

Examples

cURL — single string

cURL — batch

Python (OpenAI SDK)

Python (httpx)

curl -X POST https://api.quantized.us/v1/embeddings \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog."
  }'

curl -X POST https://api.quantized.us/v1/embeddings \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": ["one", "two", "three"]
  }'

from openai import OpenAI

client = OpenAI(
    api_key="sk-quantized-YOUR-KEY",
    base_url="https://api.quantized.us/v1",
)

resp = client.embeddings.create(
    model="text-embedding-3-small",
    input=["one", "two", "three"],
)
for item in resp.data:
    print(item.index, len(item.embedding))

import httpx

response = httpx.post(
    "https://api.quantized.us/v1/embeddings",
    headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
    json={
        "model": "text-embedding-3-small",
        "input": "Hello world",
        "dimensions": 512,
    },
)
data = response.json()
print(len(data["data"][0]["embedding"]))  # 512

Response

{
  "object": "list",
  "model": "text-embedding-3-small",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0123, -0.0456, 0.0789, ...]
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "total_tokens": 10,
    "credits_used": 32,
    "credits_remaining": 999968
  }
}

Response fields

Field	Type	Description
`object`	string	Always `"list"`
`model`	string	Model id echoed from the request (with any provider prefix preserved)
`data`	array	One entry per input, ordered by `index`
`data[].object`	string	Always `"embedding"`
`data[].index`	integer	Position of this embedding in the input batch
`data[].embedding`	array of floats	The vector
`usage.prompt_tokens`	integer	Input token count (provider-reported)
`usage.total_tokens`	integer	Equal to `prompt_tokens` for embeddings (no output tokens)
`usage.credits_used`	integer	Micro-credits consumed by this request
`usage.credits_remaining`	integer or null	Micro-credits remaining (`null` for unlimited licenses)

No streaming

Embedding endpoints are not streamable on any provider. stream: true is not accepted.

Models

Model id	Native dimension	Accepts `dimensions`?	Public list rate
`text-embedding-3-small`	1536	Yes	$0.02 / 1M tokens
`text-embedding-3-large`	3072	Yes	$0.13 / 1M tokens
`text-embedding-ada-002`	1536	No	$0.10 / 1M tokens

ada-002 returns a versioned id

When you request text-embedding-ada-002, OpenAI’s response echoes the versioned id text-embedding-ada-002-v2 in the model field. Quantized’s billing rate-table is keyed on the unversioned id, so this has no effect on cost. Client code that round-trips response.model into a follow-up request should preserve whatever id OpenAI returned.

When routing through OpenRouter, prefix the model with openai/ (e.g. openai/text-embedding-3-small). The same rate table is applied via fallback because OpenRouter does not include cost data in embedding responses.

Discovery via `/v1/models`

The three OpenAI embedding models are listed in GET /v1/models alongside chat models. Filter for embeddings either by:

supported_features contains "embeddings" (recommended, semantic):

embedding_models = [m for m in models if "embeddings" in m.get("supported_features", [])]

output_modality.text == false — embedding models declare no media-modality output because the response is a vector, not generated text. Chat models have output_modality.text == true.

The catalog entries also expose pricing in cost.prompt (micro-credits per token).

Providers

Provider	Slug	Default?
OpenAI Direct	`openai`	Yes
OpenRouter	`openrouter`	No (opt-in via header)

# Force OpenRouter passthrough
curl -X POST https://api.quantized.us/v1/embeddings \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "X-Quantized-Provider: openrouter" \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/text-embedding-3-small", "input": "hello"}'

Errors

Status	Condition
`400`	Unknown `X-Quantized-Provider` slug
`401`	Invalid or missing API key
`402`	Insufficient credits
`404`	Unknown model id (forwarded from the provider)
`422`	Validation error — missing required field, unsupported field, wrong type, or `dimensions < 1`
`503`	Upstream provider unavailable (timeout, rate limit, auth failure on the upstream key)

Empty input is accepted

An empty string "" (or a list containing one) is not rejected — OpenAI returns a valid embedding with prompt_tokens: 0. The transaction is recorded with zero cost.

Out of scope on this endpoint

The following are not accepted by /v1/embeddings (the OpenAI-shape endpoint) and may be added in a future release:

Token-array input (list[int] or list[list[int]])
Multimodal input (ContentPart[])
encoding_format: "base64"
output_dtype quantized embeddings
input_type / task_type task-conditioning fields

Native-shape sibling endpoints

For provider-specific fields the OpenAI shape doesn’t cover, use the native-shape endpoints:

/v1/aws-bedrock/embeddings — Amazon Titan v2 (with normalize, embeddingTypes) and Cohere v3 (with input_type, quantized output)
/v1/gemini/embeddings — Google Gemini (with task_type, output_dimensionality, title); automatic single/batch routing