Changelog

Embeddings endpoint (May 2026)

Added

  • POST /v1/embeddings — OpenAI-compatible embeddings endpoint.
    • Default provider: OpenAI Direct (new provider, slug openai).
    • Alternative: OpenRouter via X-Quantized-Provider: openrouter. OpenRouter passthrough uses the OpenAI rate table for billing because its response does not include cost data.
    • Supported models: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002.
    • Accepted fields: model, input (string or list of strings), dimensions, encoding_format (only "float"), user.
    • Strict validation (extra="forbid") — unknown fields return 422.

Out of scope (deferred)

  • Native Bedrock embeddings (/v1/aws-bedrock/embeddings) — Titan + Cohere — landing in a follow-up PR.
  • Token-array input (list[int] / list[list[int]]).
  • Multimodal ContentPart[] input.
  • encoding_format: "base64".
  • output_dtype quantized embeddings.
  • input_type (relevant once Cohere/Gemini providers are wired in).

Multimodal chat completions (April 2026)

Added

  • Chat Completions — new content part types:
    • input_audio — base64 audio with format (wav/mp3/aiff/aac/ogg/flac/m4a/pcm16/pcm24)
    • video_url — HTTPS URL or data:video/...;base64,... data URI
    • file — PDF via file_data (HTTPS or data:application/pdf;base64,...) with optional filename; alternatively file_id
  • Model modality validation: chat-completion requests are now validated against the target model’s input_modality before dispatch. A mismatch (e.g. audio to a text-only model) returns 400 with a clear message instead of an opaque upstream error. file parts are exempt — they are handled by OpenRouter’s universal PDF parser across all models.

Provider support

OpenRouter: all four new modalities. Anthropic: text + image only (existing behavior unchanged).

v1.1 — API Lockdown (April 2026)

Strict parameter validation

The API now enforces strict validation on all request parameters. Unknown or unsupported fields are rejected with 422 Unprocessable Entity.

Chat Completions — removed parameters:
top_k, modalities, audio, web_search_options, metadata

Chat Completions — promoted parameters (now supported):
logprobs, top_logprobs, logit_bias, repetition_penalty, user, stream_options

Chat Completions — promoted message fields (now supported):
refusal, reasoning, reasoning_details (assistant role only)

Chat Completions — accepted message fields (not forwarded to providers):
annotations, audio, function_call (accepted from OpenAI SDK responses to avoid 422 in multi-turn conversations)

Chat Completions — removed message fields:
images (in request messages)

Responses API — removed parameters:
store, text, truncation, include, service_tier, background, top_k, metadata, user

Chat Completions — removed response fields:
system_fingerprint, choices[].message.annotations, choices[].message.audio, choices[].message.images

Chat Completions — promoted response fields (now supported):
choices[].logprobs, choices[].message.reasoning, choices[].message.reasoning_details

Strict content part validation

Message content arrays now only accept text and image_url content parts. Other types like input_audio, video_url, or file are rejected.

Strict tool definition validation

Chat completions tools field now validates tool structure: each tool must follow the OpenAI format (type: "function", function: {name, description, parameters}) or Anthropic format (name, input_schema).

Parameter range validation

All generation parameters are validated at the API boundary with clear error messages:

  • max_tokens / max_completion_tokens / max_output_tokens: minimum 1
  • temperature: 0–2
  • top_p: 0–1
  • frequency_penalty / presence_penalty: −2 to 2

Out-of-range values return 422 with a message like: "Input should be greater than or equal to 16".

Type improvements

  • response_format validates structure: {"type": "json_object"} or {"type": "json_schema", ...}
  • reasoning validates effort: {"effort": "none" | "low" | "medium" | "high"} with optional exclude boolean
  • tool_choice validates values: "auto", "required", "none", or {"type": "function", "function": {"name": "..."}}
  • urls in /v1/fetch now requires list[string]

Error message sanitization

Provider error messages are now sanitized before reaching the user. URLs, email addresses, and provider names are stripped from all client-facing error messages. Internal error details are preserved in the database for debugging.


v1 — Initial Release

Endpoints

  • POST /v1/chat/completions — OpenAI-compatible chat completions
  • POST /v1/responses — Stateful Responses API
  • POST /v1/web-search — Web search with structured results
  • POST /v1/fetch — Extract text content from URLs
  • GET /v1/models — List available models and pricing
  • GET /v1/license — Check license info and credit balance

Providers

  • OpenRouter — LLMs (default for chat completions, responses, models)
  • Anthropic — Claude models directly
  • Exa — Web search and content fetch (default)
  • Tavily — Web search and content fetch (alternative)

Features

  • OpenAI SDK compatibility
  • SSE streaming for chat completions and responses
  • Unified credit billing across all providers
  • JWT authentication with auto-provisioning
  • Per-institution configuration
  • Provider routing via X-Quantized-Provider header