Changelog

Embeddings endpoint (May 2026)

Added

POST /v1/embeddings — OpenAI-compatible embeddings endpoint.
- Default provider: OpenAI Direct (new provider, slug openai).
- Alternative: OpenRouter via X-Quantized-Provider: openrouter. OpenRouter passthrough uses the OpenAI rate table for billing because its response does not include cost data.
- Supported models: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.
- Accepted fields: model, input (string or list of strings), dimensions, encoding_format (only "float"), user.
- Strict validation (extra="forbid") — unknown fields return 422.

Out of scope (deferred)

Native Bedrock embeddings (/v1/aws-bedrock/embeddings) — Titan + Cohere — landing in a follow-up PR.
Token-array input (list[int] / list[list[int]]).
Multimodal ContentPart[] input.
encoding_format: "base64".
output_dtype quantized embeddings.
input_type (relevant once Cohere/Gemini providers are wired in).

Multimodal chat completions (April 2026)

Added

Chat Completions — new content part types:
- input_audio — base64 audio with format (wav/mp3/aiff/aac/ogg/flac/m4a/pcm16/pcm24)
- video_url — HTTPS URL or data:video/...;base64,... data URI
- file — PDF via file_data (HTTPS or data:application/pdf;base64,...) with optional filename; alternatively file_id
Model modality validation: chat-completion requests are now validated against the target model’s input_modality before dispatch. A mismatch (e.g. audio to a text-only model) returns 400 with a clear message instead of an opaque upstream error. file parts are exempt — they are handled by OpenRouter’s universal PDF parser across all models.

Provider support

OpenRouter: all four new modalities. Anthropic: text + image only (existing behavior unchanged).

v1.1 — API Lockdown (April 2026)

Strict parameter validation

The API now enforces strict validation on all request parameters. Unknown or unsupported fields are rejected with 422 Unprocessable Entity.

Chat Completions — removed parameters:
top_k, modalities, audio, web_search_options, metadata

Chat Completions — promoted parameters (now supported):
logprobs, top_logprobs, logit_bias, repetition_penalty, user, stream_options

Chat Completions — promoted message fields (now supported):
refusal, reasoning, reasoning_details (assistant role only)

Chat Completions — accepted message fields (not forwarded to providers):
annotations, audio, function_call (accepted from OpenAI SDK responses to avoid 422 in multi-turn conversations)

Chat Completions — removed message fields:
images (in request messages)

Responses API — removed parameters:
store, text, truncation, include, service_tier, background, top_k, metadata, user

Chat Completions — removed response fields:
system_fingerprint, choices[].message.annotations, choices[].message.audio, choices[].message.images

Chat Completions — promoted response fields (now supported):
choices[].logprobs, choices[].message.reasoning, choices[].message.reasoning_details

Strict content part validation

Message content arrays now only accept text and image_url content parts. Other types like input_audio, video_url, or file are rejected.

Strict tool definition validation

Chat completions tools field now validates tool structure: each tool must follow the OpenAI format (type: "function", function: {name, description, parameters}) or Anthropic format (name, input_schema).

Parameter range validation

All generation parameters are validated at the API boundary with clear error messages:

max_tokens / max_completion_tokens / max_output_tokens: minimum 1
temperature: 0–2
top_p: 0–1
frequency_penalty / presence_penalty: −2 to 2

Out-of-range values return 422 with a message like: "Input should be greater than or equal to 16".

Type improvements

response_format validates structure: {"type": "json_object"} or {"type": "json_schema", ...}
reasoning validates effort: {"effort": "none" | "low" | "medium" | "high"} with optional exclude boolean
tool_choice validates values: "auto", "required", "none", or {"type": "function", "function": {"name": "..."}}
urls in /v1/fetch now requires list[string]

Error message sanitization

Provider error messages are now sanitized before reaching the user. URLs, email addresses, and provider names are stripped from all client-facing error messages. Internal error details are preserved in the database for debugging.

v1 — Initial Release

Endpoints

POST /v1/chat/completions — OpenAI-compatible chat completions
POST /v1/responses — Stateful Responses API
POST /v1/web-search — Web search with structured results
POST /v1/fetch — Extract text content from URLs
GET /v1/models — List available models and pricing
GET /v1/license — Check license info and credit balance

Providers

OpenRouter — LLMs (default for chat completions, responses, models)
Anthropic — Claude models directly
Exa — Web search and content fetch (default)
Tavily — Web search and content fetch (alternative)

Features

OpenAI SDK compatibility
SSE streaming for chat completions and responses
Unified credit billing across all providers
JWT authentication with auto-provisioning
Per-institution configuration
Provider routing via X-Quantized-Provider header

Changelog

Embeddings endpoint (May 2026)

Added

Out of scope (deferred)

Multimodal chat completions (April 2026)

Added

Provider support

v1.1 — API Lockdown (April 2026)

Strict parameter validation

Strict content part validation

Strict tool definition validation

Parameter range validation

Type improvements

Error message sanitization

v1 — Initial Release

Endpoints

Providers

Features

On This Page