Skip to content

Budgets and Spending

See also: Data Relationships, Pricing Catalog and Accounting, Request Lifecycle and Failure Modes, Identity and Access, Admin Control Plane, ADR: Spend Control Plane Reporting and Team Hard-Limit Enforcement

This page describes the live spend contract in the gateway.

Source of Truth

Ledger Contract

  • usage_cost_events is the canonical usage and spend ledger
  • request accounting is idempotent on (request_id, ownership_scope_key)
  • pricing is resolved from the internal pricing catalog and persisted into the ledger row
  • spend math uses fixed-point money and integer arithmetic

Pricing states are explicit:

  • priced
  • legacy_estimated
  • unpriced
  • usage_missing

Only priced and legacy_estimated rows count toward spend totals and budget windows.

Runtime Enforcement

Pre-provider hard-limit checks run on the live request path for:

  • POST /v1/chat/completions
  • POST /v1/responses
  • POST /v1/embeddings

Budgets are enforced by owner scope:

  • user-owned API keys use the active user budget
  • team-owned API keys use the active team budget

Hard-limit behavior:

  • if current priced spend in the active window is already at or above the configured amount and hard_limit = true, the pre-provider check fails with budget_exceeded
  • after provider execution, if current priced spend plus the computed request cost would exceed the configured amount, the ledger write is blocked before the priced row is committed
  • the HTTP status is 429
  • no provider call occurs on the pre-provider rejection path
  • observability records pre-provider rejection as a budget outcome instead of provider execution

Two-Phase Enforcement

Budget enforcement has two phases:

  1. pre-provider blocking against current priced spend
  2. post-provider projected-cost blocking before the priced ledger row is inserted

This matters because duplicate requests bypass both phases as a no-op. It also explains the boundary difference: before provider execution the gateway does not know the final request cost, but after usage and pricing are available it can block a newly computed charge that would push the owner past the hard limit.

Ownership scope keys:

  • user:
    • user:<user_id>
  • team:
    • team:<team_id>:actor:none

actor:none is the current team attribution contract. Acting-user attribution is still deferred.

Ledger Write Semantics

  • successful request handling writes a ledger row when provider usage can be normalized
  • if usage is missing, the row is marked usage_missing
  • if pricing cannot be matched exactly, the row is marked unpriced
  • unpriced and usage_missing rows stay visible in reporting but do not count toward spend totals

Use request-lifecycle-and-failure-modes.md for the cross-cutting path from request execution to ledger state.

Budget Configuration Model

  • user_budgets stores active and inactive user budgets
  • team_budgets stores active and inactive team budgets
  • each table enforces one active budget per owner

Budget fields:

  • cadence
    • daily, weekly, or monthly
  • amount_10000
  • hard_limit
  • timezone

timezone is stored now, but enforcement windows still use UTC.

Declarative Budget Seed

Active user and team budgets can also come from config-backed seed inputs.

  • teams[*].budget reconciles the listed team's active budget
  • users[*].budget reconciles the listed user's active budget
  • removing a listed owner's budget block deactivates that active budget
  • historical budget rows remain historical; config only owns the active row

Budget Threshold Alerts

Budget alerts have deeper behavior than a plain email side effect.

  • alerts are stored durably in budget_alerts
  • per-recipient delivery attempts are stored in budget_alert_deliveries
  • the initial threshold is fixed at 20% remaining budget
  • monthly cadence is supported end to end

Alert creation happens:

  • after a new chargeable ledger row is written
  • after a budget upsert, if the current spend is already at or below the threshold

Delivery behavior:

  • alert creation is durable-first
  • request handling writes alert rows and queued delivery rows first
  • a background dispatcher sends email later
  • delivery is single-attempt oriented in this slice
  • email is the only live channel today, but the schema is channel-aware

Recipient readiness:

  • user budgets notify the user email
  • team budgets notify active team owners or admins with emails

That means email readiness is part of the practical identity setup for alerting.

Spend Reporting APIs

Live admin spend APIs:

  • GET /api/v1/admin/spend/report
  • GET /api/v1/admin/spend/budgets
  • GET /api/v1/admin/spend/budget-alerts
  • PUT /api/v1/admin/spend/budgets/users/{user_id}
  • DELETE /api/v1/admin/spend/budgets/users/{user_id}
  • PUT /api/v1/admin/spend/budgets/teams/{team_id}
  • DELETE /api/v1/admin/spend/budgets/teams/{team_id}

These routes require an authenticated platform-admin session.

Spend Report Semantics

GET /api/v1/admin/spend/report is the summary endpoint behind the admin spend page.

Supported query parameters:

  • days
    • 7
    • 30
  • owner_kind
    • all
    • user
    • team

The report uses full UTC-day buckets for the selected range. Daily series are zero-filled so charts can render stable timelines even when no chargeable rows exist for a day.

The response separates:

  • total request count
  • total spend for chargeable rows
  • owner breakdowns
  • model breakdowns
  • daily spend and request points
  • counts by pricing status, including priced, legacy_estimated, unpriced, and usage_missing

Only priced and legacy_estimated rows count toward spend totals. unpriced and usage_missing rows remain visible as accounting-quality signals.

Window Semantics

  • daily windows start at 00:00:00 UTC
  • weekly windows start at Monday 00:00:00 UTC
  • monthly windows start at 00:00:00 UTC on the first day of the month
  • Sunday 23:59:59 UTC is still part of the previous weekly window

Current Gaps

  • provider breakdown is not part of spend reporting v1
  • acting-user attribution for team-owned keys remains actor:none
  • timezone-aware budget windows are still deferred
  • hardened declarative SSO-backed identity matching remains deferred

What This Page Does Not Own