AWS Bedrock Embeddings
POST /v1/aws-bedrock/embeddings
Native passthrough to AWS Bedrock embedding models. Request and response shapes match Bedrock’s InvokeModel API byte-for-byte, with a usage block layered on top for credit tracking.
/v1/embeddingsUse /v1/embeddings when you want a unified, OpenAI-compatible shape. Use this endpoint when you need Bedrock-specific fields (normalize, embeddingTypes, Cohere’s input_type and embedding_types, quantized output) that the OpenAI shape doesn’t expose.
Headers
| Header | Required | Description |
|---|---|---|
Authorization |
Yes | Bearer <api-key-or-jwt> |
Content-Type |
Yes | application/json |
X-Quantized-Provider is not used for this endpoint — the provider is implicit (Bedrock).
Request body
The body shape varies by model. The router discriminates on the model field’s prefix:
amazon.titan-*→ Titan schemacohere.*→ Cohere schema
Any other prefix is rejected with 422 before the catalog gate runs.
Titan (amazon.titan-embed-text-v2:0)
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
model |
string | Yes | — | Must start with amazon.titan- |
inputText |
string | Yes | — | Single string input. Titan is single-input-only — pass one string per call |
dimensions |
integer | No | 1024 | One of 256, 512, 1024 |
normalize |
boolean | No | true | Return a unit vector when true |
embeddingTypes |
array | No | ["float"] |
Subset of "float", "binary" |
Cohere (cohere.embed-english-v3, cohere.embed-multilingual-v3)
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
model |
string | Yes | — | Must start with cohere. |
texts |
array of strings | Yes | — | One or more inputs. Cohere handles batching natively |
input_type |
string | Yes | — | One of search_document, search_query, classification, clustering |
embedding_types |
array | No | ["float"] |
Subset of "float", "int8", "uint8", "binary", "ubinary" |
truncate |
string | No | "NONE" |
One of "NONE", "START", "END" |
Both schemas use extra="forbid" — any field not listed above is rejected with 422. This prevents OpenAI-style fields (user, encoding_format, dimensions on Cohere) from being silently dropped or forwarded.
Examples
curl -X POST https://api.quantized.us/v1/aws-bedrock/embeddings \
-H "Authorization: Bearer sk-quantized-YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "amazon.titan-embed-text-v2:0",
"inputText": "The quick brown fox jumps over the lazy dog.",
"dimensions": 512,
"normalize": true,
"embeddingTypes": ["float"]
}'
curl -X POST https://api.quantized.us/v1/aws-bedrock/embeddings \
-H "Authorization: Bearer sk-quantized-YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "cohere.embed-english-v3",
"texts": ["first doc", "second doc", "third doc"],
"input_type": "classification",
"embedding_types": ["float"]
}'
import httpx
# Titan
resp = httpx.post(
"https://api.quantized.us/v1/aws-bedrock/embeddings",
headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
json={
"model": "amazon.titan-embed-text-v2:0",
"inputText": "Hello world",
"dimensions": 256,
},
)
data = resp.json()
print(len(data["embedding"])) # 256
# Cohere — multilingual, batch
resp = httpx.post(
"https://api.quantized.us/v1/aws-bedrock/embeddings",
headers={"Authorization": "Bearer sk-quantized-YOUR-KEY"},
json={
"model": "cohere.embed-multilingual-v3",
"texts": ["Hello", "Hola", "Bonjour"],
"input_type": "search_document",
},
)
data = resp.json()
print(len(data["embeddings"]["float"])) # 3
Response
Titan response
{
"embedding": [0.0123, -0.0456, 0.0789, ...],
"embeddingsByType": {
"float": [0.0123, -0.0456, 0.0789, ...]
},
"inputTextTokenCount": 11,
"usage": {
"credits_used": 3,
"credits_remaining": 999997
}
}
| Field | Type | Description |
|---|---|---|
embedding |
array of floats or null |
Top-level float vector. null when only non-float embeddingTypes were requested |
embeddingsByType |
object or null |
Per-type vectors when embeddingTypes was specified |
inputTextTokenCount |
integer | Provider-reported token count for inputText |
usage.credits_used |
integer | Micro-credits consumed |
usage.credits_remaining |
integer or null | Micro-credits remaining (null for unlimited licenses) |
Cohere response
Cohere v3 returns two distinct response shapes depending on whether the request included embedding_types. The router preserves both byte-for-byte.
With embedding_types (by-type dict):
{
"id": "abc-123",
"texts": ["first doc", "second doc"],
"embeddings": {
"float": [[0.0123, ...], [0.0456, ...]]
},
"response_type": "embeddings_by_type",
"usage": {
"credits_used": 2,
"credits_remaining": 999998
}
}
Without embedding_types (legacy list):
{
"id": "abc-123",
"texts": ["first doc"],
"embeddings": [
[0.0123, ...]
],
"response_type": "embeddings_floats",
"usage": {
"credits_used": 1,
"credits_remaining": 999999
}
}
| Field | Type | Description |
|---|---|---|
id |
string | Cohere request id |
texts |
array | Input texts echoed back, in order |
embeddings |
object or array | Object keyed by embedding_types (e.g. "float", "int8") when the request set embedding_types; otherwise a flat array of float vectors |
response_type |
string | "embeddings_by_type" when embedding_types was set; "embeddings_floats" otherwise |
Always include embedding_types: ["float"] in requests if you want the predictable by-type dict shape. The legacy list shape only appears when embedding_types is omitted — useful for backwards compatibility with older client code, but the by-type shape is recommended for new integrations.
Cohere’s response does not include a token count. Quantized estimates input tokens at ~4 characters per token (floored at 1). This is conservative and rarely under-bills for natural-language input.
Models
| Model id | Vendor | Dimensions | Batching | Public list rate |
|---|---|---|---|---|
amazon.titan-embed-text-v2:0 |
Amazon Titan | 256, 512, 1024 | Single-input only | $0.02 / 1M tokens |
cohere.embed-english-v3 |
Cohere | 1024 | Native batch | $0.10 / 1M tokens |
cohere.embed-multilingual-v3 |
Cohere | 1024 | Native batch | $0.10 / 1M tokens |
All three are seeded with supported_features: ["bedrock_embeddings"] in GET /v1/models. Filter for Bedrock embedding models:
bedrock_embed_models = [
m for m in models
if "bedrock_embeddings" in m.get("supported_features", [])
]
Providers
| Provider | Slug | Default? |
|---|---|---|
| AWS Bedrock | bedrock |
Yes (and only) |
This endpoint is Bedrock-only — X-Quantized-Provider is ignored. If you want to route OpenAI embedding models through Bedrock-style infrastructure, that’s not supported; use /v1/embeddings instead.
Errors
| Status | Condition |
|---|---|
401 |
Invalid or missing API key |
402 |
Insufficient credits |
404 |
Model id is Bedrock-prefixed but not seeded in the catalog |
422 |
Validation error — wrong field shape for the discriminated vendor; missing inputText (Titan) or texts/input_type (Cohere); unknown field; bad model prefix |
400 |
Upstream Bedrock ValidationException (e.g. unsupported dimensions value); sanitized message returned |
503 |
Upstream throttling, transient unavailability, or auth failure on the AWS credential |
The following are not accepted by v1 and may be added in a future release:
- Cohere v4 multimodal (
cohere.embed-multimodal-v4) - Titan multimodal embedding (
amazon.titan-embed-image-v1) - Cross-region routing (the AWS region is fixed via env)