Appearance
Pricing Catalog and Accounting
See also: Configuration Reference, Provider API Compatibility, Data Relationships, Request Lifecycle and Failure Modes, Budgets and Spending, ADR: Hybrid Pricing Catalog from models.dev, ADR: Route-Level Provider API Compatibility Profiles
This page explains how the gateway turns provider usage into durable pricing records and why some successful requests are intentionally not charged.
Source of Truth
- pricing resolution and refresh logic:
- spend ledger writes:
- cache and pricing-row persistence:
- vendored fallback snapshot:
Catalog Layers
The runtime does not price directly from a live remote response on every request.
It uses three layers:
- vendored normalized fallback snapshot in the repo
- cached normalized remote snapshot in
pricing_catalog_cache - effective-dated
model_pricingrows used for historical lookup
That split keeps historical spend totals stable after an upstream catalog changes.
Upstream Source and Refresh
Current normalized pricing input comes from models.dev.
Operational shape:
- the runtime can refresh pricing metadata from the upstream feed
- cached snapshots are persisted in
pricing_catalog_cache - the vendored fallback keeps the repo bootstrappable when the remote source is unavailable
The durable historical charging source is still model_pricing, not the cache row.
Historical Pricing Contract
Pricing is effective-dated.
At request time, the gateway resolves one pricing row and copies provenance into usage_cost_events, including:
pricing_row_idpricing_provider_idpricing_model_id- copied rate fields
- pricing source metadata
Supported Pricing Paths
Current exact-only coverage is intentionally narrow:
openai_compatrequires a supportedpricing_provider_id- Vertex pricing is inferred from the upstream publisher prefix
google/...maps to Google Vertex pricinganthropic/...maps to Anthropic-on-Vertex pricing
Known coverage constraint:
- Anthropic-on-Vertex pricing is only supported for
location=global
Why Requests Become unpriced
A request can succeed and still become unpriced.
Common causes:
- missing provider pricing source
- unsupported
pricing_provider_id - unknown pricing model id
- unsupported Vertex publisher family
- unsupported Vertex location
- unsupported billing modifiers such as
service_tier - missing exact input or output rate coverage
This is fail-closed accounting behavior. Approximate billing is intentionally avoided in this slice.
usage_missing Versus unpriced
usage_missing- provider usage could not be normalized
unpriced- usage exists, but exact pricing could not be resolved safely
Both states stay visible in reporting, but neither counts toward spend totals or hard-limit windows.
Streaming Usage and Compatibility
Some OpenAI-compatible providers only emit streaming usage when the request includes stream_options.include_usage = true.
Routes can opt into that request shape with compatibility.openai_compat.supports_stream_usage. This improves usage capture for providers that support it, but it is not a billing guarantee:
- providers may omit final usage despite the option
- provider-specific usage counters may not fit the gateway accounting model
- successful requests can still become
usage_missingorunpriced
This compatibility option is Chat Completions-specific. Responses streams use the Responses event model and read usage from completed response events with response.usage.
The accounting model remains limited to prompt/input tokens, completion/output tokens, and total tokens in this slice.
Stored But Not Charged Yet
The pricing catalog can preserve more rate metadata than the runtime charges today.
| Catalog or provider signal | Current accounting status |
|---|---|
| prompt/input tokens | charged when exact pricing resolves |
| completion/output tokens | charged when exact pricing resolves |
| total tokens | stored for reporting and validation context |
| cache reads/writes | not charged yet |
| reasoning tokens or traces | not charged separately yet |
| image, audio, and file modality counters | not charged yet |
That distinction keeps the ledger conservative. The gateway should not infer spend for provider-specific counters until the pricing and request semantics are explicit. Richer token and cache accounting is tracked in issue #92.
Relationship to Request Flow
Route choice affects accounting, not only provider execution.
- the chosen provider decides the pricing family
- the chosen upstream model decides the exact lookup key
- route or request modifiers can make a request become
unpriced
Use request-lifecycle-and-failure-modes.md for the full cause-and-effect path.
Relationship to Budgets
Budget enforcement only uses priced totals.
pricedandlegacy_estimatedrows countunpricedandusage_missingrows do not count
Use budgets-and-spending.md for budget windows and spend APIs.