Mitrity LLM Gateway

The Mitrity LLM Gateway governs the traffic your agents send to LLM providers. It is a TLS-terminating egress gateway deployed in your own cluster: an agent points its provider base_url at the gateway, the gateway terminates the agent's TLS with its own certificate, runs the full MITRITY decision engine over the parsed prompt, applies any DLP redaction to the request body in place, scans the model's response (including streamed responses) before the agent sees it, and re-originates a fresh TLS connection to the real provider.

This is the fourth deployment surface alongside Mitrity Gateway, Mitrity MCP Sidecar, and the Mitrity Mesh Authorizer -- the same decision engine at a different decision point. Where the Mesh Authorizer cannot rewrite a request body on the wire, the Mitrity LLM Gateway owns the body: matched secrets and PII are scrubbed before the prompt ever leaves your cluster -- true wire-redaction, in both directions.

Use it when your agents call Anthropic or OpenAI directly (not through an MCP tool) and you want mission-scope, threat-intel, and DLP governance on that traffic -- with usage and cost visibility per agent and per model.

What it governs

Every request to a configured provider path runs the same decision engine that powers the other MITRITY surfaces:

CapabilityStatus at the LLM Gateway
Prompt DLP -- blockEnforced (403; the prompt never reaches the provider)
Prompt DLP -- redactEnforced as true wire-redaction -- matched content is scrubbed from the request body before it is forwarded
Response DLP -- block / redactEnforced on the text the model generates, before it reaches the agent; streamed responses use the holdback-window scan (see Streaming holdback)
Intent / mission-scopeEnforced -- each prompt egress is evaluated as an outbound action (llm:send:<provider>:<model>) against the agent's declared mission
Threat-intel / injection scanEnforced on the prompt content and the provider/model destination
LLM usage & BOMRecorded -- provider, model, token counts, redaction counts per agent, per call (metadata only; see LLM usage, cost & BOM)
IdentityFail-closed -- a request with no resolvable identity is always denied

A request path with no configured provider returns 404 -- it is never forwarded anywhere.

Supported providers

ProviderAPIGateway path
AnthropicMessages API/v1/messages
OpenAIChat Completions API/v1/chat/completions

Both providers are configured by default; remove one from the configuration to 404 its path. Each upstream URL must be https:// -- the gateway opens a fresh TLS connection toward the provider. You can point an upstream at a regional or proxy endpoint if your provider contract requires one.

Agents keep their own provider API key -- the gateway forwards it verbatim to the provider and only changes where the traffic flows. Pointing an agent at the gateway is a base-URL change plus trusting the gateway's CA:

# Anthropic SDKs:
export ANTHROPIC_BASE_URL=https://<gateway-host>:8443
# OpenAI SDKs (note the /v1 -- OpenAI clients append /chat/completions):
export OPENAI_BASE_URL=https://<gateway-host>:8443/v1
# Trust the gateway CA (or use your SDK's CA option):
export SSL_CERT_FILE=$PWD/mitrity-llm-gateway-ca.crt

This is not interception: the agent connects to the gateway by name and verifies the gateway's certificate against the gateway's CA, like any other HTTPS endpoint.

Identity: how the gateway knows who is calling

There are two identity paths, and the gateway fails closed when neither yields an identity:

  1. SPIFFE via mTLS (primary). When the agents run in an Istio mesh, the gateway requires and verifies a client certificate on the TLS handshake; the verified certificate's spiffe:// identity (namespace + ServiceAccount) resolves the calling agent against the same control-plane binding used by the Mesh Authorizer.
  2. MITRITY API key (fallback, for non-mesh agents). The agent presents its MITRITY-issued key in a configured header; the value resolves against the same control-plane index. The header is never forwarded to the provider and never logged. Agents on this path cannot present a client certificate, so client-certificate verification must be relaxed for them at install time.

A valid identity that maps to no governed agent is denied by default (an allow mode exists strictly as a migration escape hatch). An absent identity is always denied, regardless of configuration.

Response DLP and streaming

Response-phase DLP scans the text the model generates before it reaches the agent, complementing the request-phase prompt scan. It is on by default.

  • Non-streaming responses are buffered whole and scanned -- block, redact, or deliver. Fail-closed: a model output the gateway cannot fully read, parse, or (after a redact match) rewrite is never delivered.
  • Streaming (SSE) responses use a holdback-window scan: the gateway withholds the last few kilobytes of generated text from the agent so a secret straddling chunk boundaries can still be matched, redacted, or blocked before delivery. Stream framing (roles, tool calls, usage frames, pings) passes through untouched.

Streaming holdback: the bounded-window caveat

The holdback window is bounded (default 2 KiB, configurable from 256 bytes to 1 MiB) so the gateway never buffers an entire stream. The honest consequence: a single DLP match longer than the window that straddles the window edge can escape the streaming scan. Size the window above your longest expected pattern match -- a typical credential or PII token is far under 2 KiB. Non-streaming responses are unaffected; they are always scanned whole. Raising the window trades a little time-to-first-token latency for a wider straddle guarantee.

Privacy guarantee

Prompts and model responses never leave your cluster, except to the LLM provider you explicitly configured as the upstream. The gateway evaluates everything locally, with the same in-process decision engine as the other MITRITY binaries. Only verdict metadata -- decision, policy and pattern IDs, provider, model, redaction counts -- and token counts reach the MITRITY control plane. Prompt and response content is never included in events, logs, or the dashboard.

The agent's provider API key is forwarded only to the provider; the MITRITY identity header (API-key fallback) is never forwarded upstream and never logged.

Fail-closed behavior

The gateway is inline on the agent-to-provider path and fails closed:

  • A request with no resolvable identity, or an identity bound to no governed agent, is denied.
  • A mission-scope, DLP-block, or threat-intel verdict on the prompt denies the call with a provider-shaped error (Anthropic/OpenAI error JSON), so agent SDKs surface it cleanly.
  • A policy verdict requiring human approval is denied at the wire (a synchronous proxy has no pending state); approve the request in the dashboard approval queue and the agent retries.
  • Gateway down = governed traffic down. Agents whose base_url points at an unavailable gateway cannot reach the provider at all -- that is the fail-closed contract. Run at least two replicas (the default) and keep them close to the agents.

LLM usage, cost & BOM

Each governed call emits a usage record -- provider, model, input/output/total token counts, decision (allowed, blocked, or redacted), and redaction count -- keyed to the calling agent. These power the LLM Gateway page in the dashboard sidebar:

  • Per-call usage records: every LLM Gateway-governed call, filterable by agent, provider, model, decision, and time range. A blocked call is recorded with zero tokens -- it was never forwarded.
  • Token spend trends and estimated cost: usage aggregated by model, agent, or day. estimated_cost_usd is computed from a static price catalog keyed by (provider, model) that ships with platform releases. Prices are estimates, never billing data; a group's estimate is null when no call in it matched a catalog entry, and uncosted_calls counts the calls excluded for that reason.
  • LLM Bill of Materials (BOM): a per-agent model inventory -- one entry per distinct (agent, provider, model) ever observed through the gateway, with first-seen/last-seen timestamps and lifetime call and token totals. This answers "which agents talk to which models" the same way the AI Bill of Materials answers it for tools.

No prompt or response content is ever included -- usage records are metadata only, consistent with the privacy guarantee above. The same data is available programmatically via the /llm-usage API endpoints.

Deployment

The Mitrity LLM Gateway ships as a Helm chart, published as a cosign-signed OCI artifact with every MITRITY edge release:

helm install mitrity-llm-gateway \
  oci://ghcr.io/mitrity-io/charts/mitrity-llm-gateway --version <X.Y.Z> \
  --namespace mitrity-system --create-namespace \
  --set llmGateway.apiKey=$MITRITY_API_KEY

Key facts:

  • Available from MITRITY edge release v0.13.0. The chart version tracks the release tag (vX.Y.Z → chart version X.Y.Z).
  • Off by default. The gateway capability is flag-gated in the edge binary; installing the chart is the opt-in.
  • cert-manager provisions the gateway's own server certificate on the default TLS path (you can bring your own TLS Secret instead). Every hostname agents will use in base_url must be on the certificate.
  • Istio is optional. The gateway does not require a mesh. When agents run in an Istio mesh, issue the gateway's certificate from the mesh trust domain so in-mesh agents present SPIFFE client certificates automatically -- no agent-side changes beyond the base URL.
  • Secure by default. The gateway refuses to open a plaintext listener on a non-loopback address -- an egress gate that sees prompt bodies never runs unencrypted on the network.

The chart README packaged with the release covers the full install, provider configuration, certificate setup, troubleshooting table, and signature/SBOM verification.

Related Documentation

Mitrity LLM Gateway — Documentation | MITRITY