Embeddings

POST /v1/embeddings

Generate a vector embedding (or batch of embeddings) for one or more text inputs. Compatible with the OpenAI Embeddings API.

Headers

Header Required Description
Authorization Yes Bearer <api-key-or-jwt>
Content-Type Yes application/json
X-Quantized-Provider No Force a specific provider (openai, openrouter)

Request body

Field Type Required Default Description
model string Yes Model identifier (e.g. text-embedding-3-small)
input string or array of strings Yes Text(s) to embed. A single string returns one vector; a list returns one vector per element
dimensions integer No null Truncate output vector dimensionality. Supported on text-embedding-3-small and text-embedding-3-large. Must be >= 1
encoding_format string No "float" Output encoding. v1 only accepts "float"
user string No null End-user identifier forwarded to the provider for abuse monitoring
Strict validation

The serializer uses extra="forbid" — any field not listed above is rejected with 422. This keeps typos from being silently dropped. Notable fields not accepted in v1: input_type, output_dtype, token-array input, multimodal input.

Examples

cURL — single string
cURL — batch
Python (OpenAI SDK)
Python (httpx)
curl -X POST https://api.quantized.us/v1/embeddings \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog."
  }'
curl -X POST https://api.quantized.us/v1/embeddings \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": ["one", "two", "three"]
  }'
from openai import OpenAI

client = OpenAI(
    api_key="sk-quantized-YOUR-KEY",
    base_url="https://api.quantized.us/v1",
)

resp = client.embeddings.create(
    model="text-embedding-3-small",
    input=["one", "two", "three"],
)
for item in resp.data:
    print(item.index, len(item.embedding))
import httpx

response = httpx.post(
    "https://api.quantized.us/v1/embeddings",
    headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
    json={
        "model": "text-embedding-3-small",
        "input": "Hello world",
        "dimensions": 512,
    },
)
data = response.json()
print(len(data["data"][0]["embedding"]))  # 512

Response

{
  "object": "list",
  "model": "text-embedding-3-small",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0123, -0.0456, 0.0789, ...]
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "total_tokens": 10,
    "credits_used": 32,
    "credits_remaining": 999968
  }
}

Response fields

Field Type Description
object string Always "list"
model string Model id echoed from the request (with any provider prefix preserved)
data array One entry per input, ordered by index
data[].object string Always "embedding"
data[].index integer Position of this embedding in the input batch
data[].embedding array of floats The vector
usage.prompt_tokens integer Input token count (provider-reported)
usage.total_tokens integer Equal to prompt_tokens for embeddings (no output tokens)
usage.credits_used integer Micro-credits consumed by this request
usage.credits_remaining integer or null Micro-credits remaining (null for unlimited licenses)
No streaming

Embedding endpoints are not streamable on any provider. stream: true is not accepted.

Models

Model id Native dimension Accepts dimensions? Public list rate
text-embedding-3-small 1536 Yes $0.02 / 1M tokens
text-embedding-3-large 3072 Yes $0.13 / 1M tokens
text-embedding-ada-002 1536 No $0.10 / 1M tokens
ada-002 returns a versioned id

When you request text-embedding-ada-002, OpenAI’s response echoes the versioned id text-embedding-ada-002-v2 in the model field. Quantized’s billing rate-table is keyed on the unversioned id, so this has no effect on cost. Client code that round-trips response.model into a follow-up request should preserve whatever id OpenAI returned.

When routing through OpenRouter, prefix the model with openai/ (e.g. openai/text-embedding-3-small). The same rate table is applied via fallback because OpenRouter does not include cost data in embedding responses.

Discovery via /v1/models

The three OpenAI embedding models are listed in GET /v1/models alongside chat models. Filter for embeddings either by:

  • supported_features contains "embeddings" (recommended, semantic):
    embedding_models = [m for m in models if "embeddings" in m.get("supported_features", [])]
    
  • output_modality.text == false — embedding models declare no media-modality output because the response is a vector, not generated text. Chat models have output_modality.text == true.

The catalog entries also expose pricing in cost.prompt (micro-credits per token).

Providers

Provider Slug Default?
OpenAI Direct openai Yes
OpenRouter openrouter No (opt-in via header)
# Force OpenRouter passthrough
curl -X POST https://api.quantized.us/v1/embeddings \
  -H "Authorization: Bearer sk-quantized-YOUR-KEY" \
  -H "X-Quantized-Provider: openrouter" \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/text-embedding-3-small", "input": "hello"}'

Errors

Status Condition
400 Unknown X-Quantized-Provider slug
401 Invalid or missing API key
402 Insufficient credits
404 Unknown model id (forwarded from the provider)
422 Validation error — missing required field, unsupported field, wrong type, or dimensions < 1
503 Upstream provider unavailable (timeout, rate limit, auth failure on the upstream key)
Empty input is accepted

An empty string "" (or a list containing one) is not rejected — OpenAI returns a valid embedding with prompt_tokens: 0. The transaction is recorded with zero cost.

Out of scope on this endpoint

The following are not accepted by /v1/embeddings (the OpenAI-shape endpoint) and may be added in a future release:

  • Token-array input (list[int] or list[list[int]])
  • Multimodal input (ContentPart[])
  • encoding_format: "base64"
  • output_dtype quantized embeddings
  • input_type / task_type task-conditioning fields
Native-shape sibling endpoints

For provider-specific fields the OpenAI shape doesn’t cover, use the native-shape endpoints:

  • /v1/aws-bedrock/embeddings — Amazon Titan v2 (with normalize, embeddingTypes) and Cohere v3 (with input_type, quantized output)
  • /v1/gemini/embeddings — Google Gemini (with task_type, output_dimensionality, title); automatic single/batch routing