Appearance
Google Cloud Run OpenAI-Compatible Models
See also: Configuration Reference, Model Routing and API Behavior, Provider API Compatibility, Google Vertex AI, Pricing Catalog and Accounting
This page owns provider-specific configuration examples for private Cloud Run services that expose an OpenAI-compatible /v1 API, such as vLLM-hosted Gemma deployments.
Current Runtime Boundary
Use gcp_cloud_run_openai_compat when the upstream service:
- is deployed on Cloud Run | Google Cloud
- exposes OpenAI-compatible endpoints such as
/v1/chat/completions - requires Cloud Run IAM authentication with a Google-signed OIDC ID token
Use provider: openai_compat for arbitrary OpenAI-compatible endpoints with static bearer-token auth.
Use provider: gcp_vertex for Vertex AI publisher endpoints.
Provider
yaml
providers:
- id: gemma-cloud-run
type: gcp_cloud_run_openai_compat
base_url: https://gemma-service-abc-uc.a.run.app/v1
pricing_provider_id: google-vertex
auth:
mode: adc
display:
label: Gemma on Cloud Run
icon_key: vertexaibase_url must use https. When audience is omitted, the gateway derives the Cloud Run audience from the service origin. For example, https://gemma-service-abc-uc.a.run.app/v1 becomes https://gemma-service-abc-uc.a.run.app/.
Set audience when the service uses a Cloud Run custom audience:
yaml
providers:
- id: gemma-cloud-run-custom-audience
type: gcp_cloud_run_openai_compat
base_url: https://gemma.example.com/v1
audience: https://custom-audience.example.com
pricing_provider_id: google-vertex
auth:
mode: adcAuth Modes
WARNING
Avoid putting service-account JSON or short-lived ID tokens directly in gateway.yaml. Use mounted files or environment references, see Kubernetes and Helm - Secrets.
Application Default Credentials
In Google Cloud runtimes with an attached service account, the gateway uses the metadata server identity endpoint to mint an audience-scoped ID token. When ADC points at a service-account JSON file, the gateway uses the service account's OAuth token URI with a signed JWT assertion that includes target_audience.
yaml
auth:
mode: adcService Account JSON
Uses the service account's OAuth token URI with a signed JWT assertion that includes target_audience for the configured or derived audience.
yaml
auth:
mode: service_account
credentials_path: /var/run/secrets/gcp/service-account.jsonBearer Token
bearer should only be used in constrained, debugging environments where an admin has already minted an ID token. The token is treated as static and is not refreshed.
yaml
auth:
mode: bearer
token: env.CLOUD_RUN_ID_TOKENAuth Header
The default upstream auth header is Authorization: Bearer <token>.
Use auth_header: x_serverless_authorization when a Cloud Run proxy or frontend needs the original Authorization header for application-level auth:
yaml
providers:
- id: gemma-cloud-run
type: gcp_cloud_run_openai_compat
base_url: https://gemma-service-abc-uc.a.run.app/v1
pricing_provider_id: google-vertex
auth_header: x_serverless_authorization
auth:
mode: adcRoute Example
Cloud Run vLLM routes are OpenAI-compatible routes. Use route extra_body for vLLM/Gemma request controls that are additive provider parameters:
yaml
models:
- id: gemma-cloud-run
description: Gemma served by vLLM on private Cloud Run
tags: [cloud-run, gemma]
routes:
- provider: gemma-cloud-run
upstream_model: google/gemma-4-12b-it
capabilities:
chat_completions: true
responses: false
embeddings: false
extra_body:
chat_template_kwargs:
enable_thinking: true
skip_special_tokens: falseKeep route capability flags aligned with the deployed vLLM server and tested gateway behavior. The current provider path reuses OpenAI-compatible Chat Completions, streaming, Responses, and embeddings request handling, but a given Cloud Run service might only expose some of those endpoints.
IAM Notes
- Grant the calling gateway identity
roles/run.invokeron the receiving Cloud Run service. - The ID-token audience must match the Cloud Run service URL or a configured custom audience.
- Tokens are cached and refreshed before expiry.
auth.mode: adcis preferred for workloads running on Google Cloud.auth.mode: service_accountis useful when a mounted JSON key is the deployment constraint.
Pricing And Budgets
Cloud Run OpenAI-compatible providers require pricing_provider_id, matching ordinary openai_compat providers. The gateway uses this pricing identity plus upstream_model for catalog lookup. If the model is not present in the catalog, the request remains visible as unpriced; the gateway does not guess self-hosted inference cost.
Budgets are still configured for users, service accounts, and user-model scopes. Provider auth does not create a budget principal. See Budgets for user-facing setup.
Validation
For provider changes, run:
bash
cargo test -p gateway-providers id_token
cargo test -p gateway-providers x_serverless
cargo test -p gateway cloud_run_openai_compat
mise run lint
mise run docs:check