Embeddings
POST /v1/embeddings
Generate a vector embedding (or batch of embeddings) for one or more text inputs. Compatible with the OpenAI Embeddings API.
Headers
| Header | Required | Description |
|---|---|---|
Authorization |
Yes | Bearer <api-key-or-jwt> |
Content-Type |
Yes | application/json |
X-Quantized-Provider |
No | Force a specific provider (openai, openrouter) |
Request body
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
model |
string | Yes | — | Model identifier (e.g. text-embedding-3-small) |
input |
string or array of strings | Yes | — | Text(s) to embed. A single string returns one vector; a list returns one vector per element |
dimensions |
integer | No | null | Truncate output vector dimensionality. Supported on text-embedding-3-small and text-embedding-3-large. Must be >= 1 |
encoding_format |
string | No | "float" |
Output encoding. v1 only accepts "float" |
user |
string | No | null | End-user identifier forwarded to the provider for abuse monitoring |
The serializer uses extra="forbid" — any field not listed above is rejected with 422. This keeps typos from being silently dropped. Notable fields not accepted in v1: input_type, output_dtype, token-array input, multimodal input.
Examples
curl -X POST https://api.quantized.us/v1/embeddings \
-H "Authorization: Bearer sk-quantized-YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "The quick brown fox jumps over the lazy dog."
}'
curl -X POST https://api.quantized.us/v1/embeddings \
-H "Authorization: Bearer sk-quantized-YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": ["one", "two", "three"]
}'
from openai import OpenAI
client = OpenAI(
api_key="sk-quantized-YOUR-KEY",
base_url="https://api.quantized.us/v1",
)
resp = client.embeddings.create(
model="text-embedding-3-small",
input=["one", "two", "three"],
)
for item in resp.data:
print(item.index, len(item.embedding))
import httpx
response = httpx.post(
"https://api.quantized.us/v1/embeddings",
headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
json={
"model": "text-embedding-3-small",
"input": "Hello world",
"dimensions": 512,
},
)
data = response.json()
print(len(data["data"][0]["embedding"])) # 512
Response
{
"object": "list",
"model": "text-embedding-3-small",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0123, -0.0456, 0.0789, ...]
}
],
"usage": {
"prompt_tokens": 10,
"total_tokens": 10,
"credits_used": 32,
"credits_remaining": 999968
}
}
Response fields
| Field | Type | Description |
|---|---|---|
object |
string | Always "list" |
model |
string | Model id echoed from the request (with any provider prefix preserved) |
data |
array | One entry per input, ordered by index |
data[].object |
string | Always "embedding" |
data[].index |
integer | Position of this embedding in the input batch |
data[].embedding |
array of floats | The vector |
usage.prompt_tokens |
integer | Input token count (provider-reported) |
usage.total_tokens |
integer | Equal to prompt_tokens for embeddings (no output tokens) |
usage.credits_used |
integer | Micro-credits consumed by this request |
usage.credits_remaining |
integer or null | Micro-credits remaining (null for unlimited licenses) |
Embedding endpoints are not streamable on any provider. stream: true is not accepted.
Models
| Model id | Native dimension | Accepts dimensions? |
Public list rate |
|---|---|---|---|
text-embedding-3-small |
1536 | Yes | $0.02 / 1M tokens |
text-embedding-3-large |
3072 | Yes | $0.13 / 1M tokens |
text-embedding-ada-002 |
1536 | No | $0.10 / 1M tokens |
When you request text-embedding-ada-002, OpenAI’s response echoes the versioned id text-embedding-ada-002-v2 in the model field. Quantized’s billing rate-table is keyed on the unversioned id, so this has no effect on cost. Client code that round-trips response.model into a follow-up request should preserve whatever id OpenAI returned.
When routing through OpenRouter, prefix the model with openai/ (e.g. openai/text-embedding-3-small). The same rate table is applied via fallback because OpenRouter does not include cost data in embedding responses.
Discovery via /v1/models
The three OpenAI embedding models are listed in GET /v1/models alongside chat models. Filter for embeddings either by:
supported_featurescontains"embeddings"(recommended, semantic):embedding_models = [m for m in models if "embeddings" in m.get("supported_features", [])]output_modality.text == false— embedding models declare no media-modality output because the response is a vector, not generated text. Chat models haveoutput_modality.text == true.
The catalog entries also expose pricing in cost.prompt (micro-credits per token).
Providers
| Provider | Slug | Default? |
|---|---|---|
| OpenAI Direct | openai |
Yes |
| OpenRouter | openrouter |
No (opt-in via header) |
# Force OpenRouter passthrough
curl -X POST https://api.quantized.us/v1/embeddings \
-H "Authorization: Bearer sk-quantized-YOUR-KEY" \
-H "X-Quantized-Provider: openrouter" \
-H "Content-Type: application/json" \
-d '{"model": "openai/text-embedding-3-small", "input": "hello"}'
Errors
| Status | Condition |
|---|---|
400 |
Unknown X-Quantized-Provider slug |
401 |
Invalid or missing API key |
402 |
Insufficient credits |
404 |
Unknown model id (forwarded from the provider) |
422 |
Validation error — missing required field, unsupported field, wrong type, or dimensions < 1 |
503 |
Upstream provider unavailable (timeout, rate limit, auth failure on the upstream key) |
An empty string "" (or a list containing one) is not rejected — OpenAI returns a valid embedding with prompt_tokens: 0. The transaction is recorded with zero cost.
The following are not accepted by /v1/embeddings (the OpenAI-shape endpoint) and may be added in a future release:
- Token-array input (
list[int]orlist[list[int]]) - Multimodal input (
ContentPart[]) encoding_format: "base64"output_dtypequantized embeddingsinput_type/task_typetask-conditioning fields
For provider-specific fields the OpenAI shape doesn’t cover, use the native-shape endpoints:
/v1/aws-bedrock/embeddings— Amazon Titan v2 (withnormalize,embeddingTypes) and Cohere v3 (withinput_type, quantized output)/v1/gemini/embeddings— Google Gemini (withtask_type,output_dimensionality,title); automatic single/batch routing