Docs / Gateway, Sidecar y Mesh / Mitrity LLM Gateway

Mitrity LLM Gateway

The Mitrity LLM Gateway governs the traffic your agents send to LLM providers. It is a TLS-terminating egress gateway deployed in your own cluster: an agent points its provider base_url at the gateway, the gateway terminates the agent's TLS with its own certificate, runs the full MITRITY decision engine over the parsed prompt, applies any DLP redaction to the request body in place, scans the model's response (including streamed responses) before the agent sees it, and re-originates a fresh TLS connection to the real provider.

This is the fourth deployment surface alongside Mitrity Gateway, Mitrity MCP Sidecar, and the Mitrity Mesh Authorizer -- the same decision engine at a different decision point. Where the Mesh Authorizer cannot rewrite a request body on the wire, the Mitrity LLM Gateway owns the body: matched secrets and PII are scrubbed before the prompt ever leaves your cluster -- true wire-redaction, in both directions.

Use it when your agents call Anthropic or OpenAI directly (not through an MCP tool) and you want mission-scope, threat-intel, and DLP governance on that traffic -- with usage and cost visibility per agent and per model.

What it governs

Every request to a configured provider path runs the same decision engine that powers the other MITRITY surfaces:

Capability	Status at the LLM Gateway
Prompt DLP -- block	Enforced (403; the prompt never reaches the provider)
Prompt DLP -- redact	Enforced as true wire-redaction -- matched content is scrubbed from the request body before it is forwarded
Response DLP -- block / redact	Enforced on the text the model generates, before it reaches the agent; streamed responses use the holdback-window scan (see Streaming holdback)
Intent / mission-scope	Enforced -- each prompt egress is evaluated as an outbound action (`llm:send:<provider>:<model>`) against the agent's declared mission
Threat-intel / injection scan	Enforced on the prompt content and the provider/model destination
Model access -- allowlist / denylist	Enforced -- a policy can restrict which providers/models an agent may call; an off-allowlist call is denied (403). See Guardrails
Per-call token cap	Enforced -- a policy can cap the output tokens a single call may request; an over-cap call is denied
Cost budget -- token / USD	Enforced -- a per-policy budget over a rolling window; when exceeded the gateway blocks (403), holds for approval (409), or alerts-only
Rate limit -- requests/min	Enforced -- a per-policy cap on LLM calls/min per agent; over-limit calls get 429
LLM usage & BOM	Recorded -- provider, model, token counts, redaction counts per agent, per call (metadata only; see LLM usage, cost & BOM)
Identity	Fail-closed -- a request with no resolvable identity is always denied

A request path with no configured provider returns 404 -- it is never forwarded anywhere.

Guardrails: model access, cost & rate

Beyond inspecting each prompt, the gateway enforces per-policy guardrails on LLM traffic. All of them are configured in the dashboard under Policies → (a policy) → LLM Guardrails, apply to every agent bound to that policy, and are enforced at the gateway before the call reaches the provider:

Model allowlists / denylists. Restrict which providers and models an agent may call -- e.g. allow only anthropic:claude-*, or deny openai:gpt-4o. Authored as llm:send:<provider>:<model> allow/deny rules; the policy's default action decides "allow all except denied" vs "deny all except allowed." An off-allowlist call is denied with a provider-shaped 403.
Per-call token caps. Cap the output tokens a single call may request (max_output_tokens). A request over the cap is denied before it reaches the provider.
Cost budgets (token / USD). A per-policy budget over a rolling day / week / month window, measured from actual usage. Set a token budget (exact), a USD budget (best-effort, priced from the model catalog), or both -- plus a warn threshold (default 80%) and an exceeded action:
- block -- over-budget calls are denied (403)
- hold -- over-budget calls are held for approval (409); approve to keep spending
- alert-only -- calls proceed; you are notified The dashboard shows a live burn-down (used vs limit, % consumed, current window) on the policy, and spend alerts fire to Slack/email once at the warn threshold and once when exceeded.
Rate limits. A per-policy cap on LLM calls per minute per agent; requests over the limit get a 429.

Budgets and rate limits fail open -- an agent with no budget or limit configured is never blocked (cost control is opt-in), unlike identity and DLP, which fail closed.

These fine-grained controls are specific to the LLM Gateway, which parses the request and response body. The Mesh Authorizer can enforce a coarse egress allowlist to provider hosts (e.g. block api.openai.com entirely) but cannot see the model, token counts, or spend -- route LLM traffic through the gateway for model / token / budget / rate governance.

Supported providers

Provider	API	Gateway path
Anthropic	Messages API	`/v1/messages`
OpenAI	Chat Completions API	`/v1/chat/completions`

Both providers are configured by default; remove one from the configuration to 404 its path. Each upstream URL must be https:// -- the gateway opens a fresh TLS connection toward the provider. You can point an upstream at a regional or proxy endpoint if your provider contract requires one.

Agents keep their own provider API key -- the gateway forwards it verbatim to the provider and only changes where the traffic flows. Pointing an agent at the gateway is a base-URL change plus trusting the gateway's CA:

# Anthropic SDKs:
export ANTHROPIC_BASE_URL=https://<gateway-host>:8443
# OpenAI SDKs (note the /v1 -- OpenAI clients append /chat/completions):
export OPENAI_BASE_URL=https://<gateway-host>:8443/v1
# Trust the gateway CA (or use your SDK's CA option):
export SSL_CERT_FILE=$PWD/mitrity-llm-gateway-ca.crt

This is not interception: the agent connects to the gateway by name and verifies the gateway's certificate against the gateway's CA, like any other HTTPS endpoint.

Identity: how the gateway knows who is calling

There are two identity paths, and the gateway fails closed when neither yields an identity:

SPIFFE via mTLS (primary). When the agents run in an Istio mesh, the gateway requires and verifies a client certificate on the TLS handshake; the verified certificate's spiffe:// identity (namespace + ServiceAccount) resolves the calling agent against the same control-plane binding used by the Mesh Authorizer.
MITRITY API key (fallback, for non-mesh agents). The agent presents its MITRITY-issued key in a configured header; the value resolves against the same control-plane index. The header is never forwarded to the provider and never logged. Agents on this path cannot present a client certificate, so client-certificate verification must be relaxed for them at install time.

A valid identity that maps to no governed agent is denied by default (an allow mode exists strictly as a migration escape hatch). An absent identity is always denied, regardless of configuration.

Mesh mode: one hop governs the whole fleet

The gateway runs in one of two identity modes, set by llmGateway.identityMode:

mesh (default) -- the centralized, N-to-1 egress hop. A single gateway Service governs every agent in the environment. It resolves each caller per request (SPIFFE mTLS, or the API-key fallback) against a (namespace, ServiceAccount) → agent index, then enforces that agent's policy -- its model allowlist, token caps, budgets, rate limits, and DLP. This is the mode for the shared in-cluster LLM-egress hop, and it is how mesh customers get body-level LLM governance the Mesh Authorizer cannot provide (it is L3/L4 and never sees the prompt).
colocated -- the 1-to-1 sidecar. The gateway serves a single configured agent and trusts a same-pod loopback call as that agent. Used when a gateway is deployed next to exactly one workload.

Mesh mode needs a mesh-edge credential. To govern the fleet the gateway must hold every served agent's policy, and it gets them from its own heartbeat. So in mesh mode llmGateway.agentId must be your environment's mesh-edge credential -- an agent you create with is_mesh_edge: true. A mesh-edge agent's heartbeat returns the mission profiles of every mesh-bound agent in its environment (fleet provisioning), which is what populates the resolution index. Point the gateway at a plain agent and it receives only that one profile, so every other caller resolves to nothing and is denied. Delivery is scoped to the mesh edge's own environment -- least privilege -- and the flag is settable only through the agent API, never by an agent self-registering. This is the same fleet-provisioning mechanism the Mesh Authorizer uses to serve many agents from one deployment.

Once installed, confirm the fleet is flowing: the gateway logs heartbeat synced mesh fleet with the mesh-bound profile count. A mesh gateway that never logs it is almost always pointed at a non-mesh-edge credential.

Response DLP and streaming

Response-phase DLP scans the text the model generates before it reaches the agent, complementing the request-phase prompt scan. It is on by default.

Non-streaming responses are buffered whole and scanned -- block, redact, or deliver. Fail-closed: a model output the gateway cannot fully read, parse, or (after a redact match) rewrite is never delivered.
Streaming (SSE) responses use a holdback-window scan: the gateway withholds the last few kilobytes of generated text from the agent so a secret straddling chunk boundaries can still be matched, redacted, or blocked before delivery. Stream framing (roles, tool calls, usage frames, pings) passes through untouched.

Streaming holdback: the bounded-window caveat

The holdback window is bounded (default 2 KiB, configurable from 256 bytes to 1 MiB) so the gateway never buffers an entire stream. The honest consequence: a single DLP match longer than the window that straddles the window edge can escape the streaming scan. Size the window above your longest expected pattern match -- a typical credential or PII token is far under 2 KiB. Non-streaming responses are unaffected; they are always scanned whole. Raising the window trades a little time-to-first-token latency for a wider straddle guarantee.

Privacy guarantee

Prompts and model responses never leave your cluster, except to the LLM provider you explicitly configured as the upstream. The gateway evaluates everything locally, with the same in-process decision engine as the other MITRITY binaries. Only verdict metadata -- decision, policy and pattern IDs, provider, model, redaction counts -- and token counts reach the MITRITY control plane. Prompt and response content is never included in events, logs, or the dashboard.

The agent's provider API key is forwarded only to the provider; the MITRITY identity header (API-key fallback) is never forwarded upstream and never logged.

Fail-closed behavior

The gateway is inline on the agent-to-provider path and fails closed:

A request with no resolvable identity, or an identity bound to no governed agent, is denied.
A mission-scope, DLP-block, or threat-intel verdict on the prompt denies the call with a provider-shaped error (Anthropic/OpenAI error JSON), so agent SDKs surface it cleanly.
A policy verdict requiring human approval is denied at the wire (a synchronous proxy has no pending state); approve the request in the dashboard approval queue and the agent retries.
Gateway down = governed traffic down. Agents whose base_url points at an unavailable gateway cannot reach the provider at all -- that is the fail-closed contract. Run at least two replicas (the default) and keep them close to the agents.

LLM usage, cost & BOM

Each governed call emits a usage record -- provider, model, input/output/total token counts, decision (allowed, blocked, or redacted), and redaction count -- keyed to the calling agent. These power the LLM Gateway page in the dashboard sidebar:

Per-call usage records: every LLM Gateway-governed call, filterable by agent, provider, model, decision, and time range. A blocked call is recorded with zero tokens -- it was never forwarded.
Token spend trends and estimated cost: usage aggregated by model, agent, or day. estimated_cost_usd is computed from a static price catalog keyed by (provider, model) that ships with platform releases. Prices are estimates, never billing data; a group's estimate is null when no call in it matched a catalog entry, and uncosted_calls counts the calls excluded for that reason.
LLM Bill of Materials (BOM): a per-agent model inventory -- one entry per distinct (agent, provider, model) ever observed through the gateway, with first-seen/last-seen timestamps and lifetime call and token totals. This answers "which agents talk to which models" the same way the AI Bill of Materials answers it for tools.

These same usage numbers drive the per-policy cost budgets, the burn-down panel, and the spend alerts in Guardrails -- measurement becomes enforcement.

No prompt or response content is ever included -- usage records are metadata only, consistent with the privacy guarantee above. The same data is available programmatically via the /llm-usage API endpoints.

Deployment

The Mitrity LLM Gateway ships as a Helm chart, published as a cosign-signed OCI artifact with every MITRITY edge release:

helm install mitrity-llm-gateway \
  oci://ghcr.io/mitrity-io/charts/mitrity-llm-gateway --version <X.Y.Z> \
  --namespace mitrity-system --create-namespace \
  --set llmGateway.apiKey=$MITRITY_API_KEY

Key facts:

Available from MITRITY edge release v0.13.0. The chart version tracks the release tag (vX.Y.Z → chart version X.Y.Z).
Off by default. The gateway capability is flag-gated in the edge binary; installing the chart is the opt-in.
Mesh mode wants a mesh-edge credential. In the default identityMode: mesh, set llmGateway.agentId to an agent created with is_mesh_edge: true so its heartbeat provisions the fleet (see Mesh mode above). Startup fails fast if agentId is empty in mesh mode.
cert-manager provisions the gateway's own server certificate on the default TLS path (you can bring your own TLS Secret instead). Every hostname agents will use in base_url must be on the certificate.
Istio is optional. The gateway does not require a mesh. When agents run in an Istio mesh, issue the gateway's certificate from the mesh trust domain so in-mesh agents present SPIFFE client certificates automatically -- no agent-side changes beyond the base URL.
Secure by default. The gateway refuses to open a plaintext listener on a non-loopback address -- an egress gate that sees prompt bodies never runs unencrypted on the network.

The chart README packaged with the release covers the full install, provider configuration, certificate setup, troubleshooting table, and signature/SBOM verification.

Integration Modes -- Mitrity Gateway and Mitrity MCP Sidecar
Mesh Enforcement (Istio) -- the in-mesh governance surface and its capability table
Per-Agent Identity in the Mesh -- the namespace + ServiceAccount binding the SPIFFE identity path uses
API Overview -- the /llm-usage endpoints for usage records, summaries, and the BOM