Docs / セキュリティ / DLP Pattern Authoring

DLP Pattern Authoring

DLP pattern authoring is the content layer of MITRITY DLP — it governs what data is allowed to flow through your agents. It is the companion to destination allowlists, which form the destination layer of DLP — they govern where data can flow.

The Two-Layer Model

A DLP pattern is a reusable definition — a regex, a list of literal strings, or a built-in classifier. A pattern doesn't do anything on its own. To enforce, you attach the pattern to a policy with a binding, which carries the per-policy settings: direction, action, priority, enabled flag. The same pattern can be bound to many policies, each with different settings.

dlp_patterns (definitions)
   │ M:N via bindings
   ▼
policy_dlp_bindings (per-policy: direction + action + priority + enabled)
   │
   ▼
agents.policy_id → policies → bindings → patterns

This separation matters in practice: the security team curates a tenant-wide catalogue of patterns once, and individual policy owners decide which patterns apply to their agents and how strict the enforcement should be.

Where to Find It

Navigate to /app/policies, expand a policy row, and open the DLP sub-tab. The tab has two sections:

Built-in patterns — the MITRITY-maintained catalogue covering common PII and credential formats.
Custom patterns — patterns your tenant has authored. Editors and above can create, edit, and delete entries here.

Each section lists the bindings currently attached to this policy. Add a binding by selecting a pattern from the pattern picker; the binding inherits sensible defaults (direction both, action flag, priority 50) which you can tune inline.

Built-in Pattern Catalogue

MITRITY ships a curated set of detectors covering the most common PII and credential formats. These are maintained by MITRITY — customers cannot edit the underlying detection logic. You bind them and configure direction, action, and priority on the binding.

Pattern	Type	Category	Detects
`ssn_us`	classifier	pii	US SSNs with structural validation (area not 000/666/9XX; group not 00; serial not 0000)
`personnummer_se`	classifier	pii	Swedish personal numbers (YYMMDD + 4 digits, real date, Luhn-style checksum)
`credit_card`	classifier	pii	Credit cards 13–19 digits, Luhn-validated
`iban`	classifier	pii	IBAN with mod-97 checksum
`email`	regex	pii	RFC 5322 simplified
`phone`	classifier	pii	E.164 + common national formats
`ipv4`	regex	pii	Dotted-quad IPv4 addresses
`aws_access_key`	regex	credentials	`AKIA[0-9A-Z]{16}`
`github_token`	regex	credentials	`ghp_` / `gho_` / `ghu_` / `ghs_` / `ghr_` prefixes
`generic_api_key`	classifier	credentials	Entropy + prefix heuristic. Low-confidence — recommended `flag` action only

The classifier-backed built-ins do more than a regex match — they apply structural validation (Luhn, mod-97, real-date checks) to keep false positives manageable. The generic_api_key classifier is intentionally fuzzy; bind it with action=flag so you keep visibility without blocking legitimate traffic.

Custom Patterns

There are two types of customer-authored pattern:

`regex`

A single regular expression with an optional flags field. Only the i flag (case-insensitive) is accepted. Every regex is:

Compile-tested at save time — invalid regexes are rejected with the underlying parse error.
ReDoS-fuzzed with a 10 KB pathological input under a 100 ms budget. Patterns that time out are rejected before they can ever run in production.

{
  "name": "internal_project_id",
  "pattern_type": "regex",
  "pattern_data": { "pattern": "PRJ-[A-Z0-9]{8}", "flags": "" },
  "category": "internal"
}

`string_list`

A literal list of strings with an optional case_insensitive flag. No regex semantics — characters like ., *, and ? are matched literally. Best for competitor names, internal product codenames, legal-hold phrases, project keywords.

{
  "name": "marketing_competitors",
  "pattern_type": "string_list",
  "pattern_data": { "strings": ["CompetitorA", "CompetitorB"], "case_insensitive": true },
  "category": "competitor"
}

`classifier`

Reserved for MITRITY-maintained built-ins. POST or PUT with pattern_type=classifier returns 400 Bad Request.

Bindings — Direction, Action, Priority

A binding is the configurable attachment between a policy and a pattern. Four fields control its behaviour:

direction — one of inbound (data flowing into the agent, e.g. tool responses), outbound (data leaving the agent, e.g. tool arguments), or both. Defaults to both.
action — one of block (deny the call), redact (replace each match with [REDACTED-<pattern_name>] before the payload is forwarded), or flag (audit-only, no payload modification). Defaults to flag.
priority — integer, higher evaluates first, first match wins. Defaults to 50. Use this to ensure high-confidence patterns shadow lower-confidence ones on the same payload.
enabled — boolean. Disable a binding temporarily without deleting it.

Worked Example

The same marketing_competitors pattern, two different policies, two different behaviours:

Marketing policy: bind with direction=outbound, action=block. Agents can research competitors (tool responses come in freely) but cannot publish anything containing competitor names.
Support policy: bind with direction=both, action=redact. Support replies containing competitor names get auto-redacted before they reach the customer; the same applies to incoming customer messages that mention competitors in case the agent quotes them back.

That reuse — one pattern definition, multiple policy bindings with different intent — is the value of the two-layer model.

Direction Inference

The Edge Node has to decide which side of an MCP exchange to inspect. It infers inbound vs outbound from the action type using a heuristic: action types whose name contains any of send, post, put, write, upload, push, publish, or forward classify as outbound. Everything else classifies as inbound.

This is edge behaviour, not part of the API contract — the heuristic may evolve as we observe more tool categories. If you're not sure how a particular tool action classifies, bind with direction=both and the binding fires on either side of the exchange.

The rule is non-negotiable: DLP audit events never store the raw matched content. The matched substring is the most sensitive part of the request — it's literally the thing the policy is there to protect — so it must not be persisted in the audit store.

Every DLP audit event carries enough structural information to investigate a match without persisting the matched substring itself:

{
  "id": "evt_...",
  "tenant_id": "...",
  "agent_id": "...",
  "policy_id": "...",
  "pattern_id": "ssn_us",
  "pattern_name": "SSN (US)",
  "direction": "outbound",
  "action_taken": "blocked",
  "detection_reason": "pattern_match",
  "position": 247,
  "length": 11,
  "tool_pattern_matched": "drive:create",
  "occurred_at": "2026-05-26T10:14:00Z"
}

No value, no matched_text, no other field carrying raw content. The boundary between the detection engine and the audit reporter strips any such field defensively — even if a future engine change accidentally tried to attach raw content, the reporter would drop it before persistence.

Mapping Binding Action to Edge Behaviour and Audit Outcome

Binding action	Edge behaviour	`action_taken` value
`block`	Tool call denied with JSON-RPC error returned to the caller	`blocked`
`redact`	Match replaced with `[REDACTED-<pattern_name>]` before payload forwarded downstream	`redacted`
`flag`	Match passes through unchanged; event recorded for review	`flagged`

action_taken reuses the v1 vocabulary so legacy ingest paths (SIEM forwarders, downstream alerting) keep working without changes. The new detection_reason='pattern_match' value distinguishes v2 binding-driven matches from v1 classifier reasons (sensitive_exfil, volume_anomaly, accumulation, unauthorized_destination), which continue to fire from the existing detection paths.

Coexistence with Destination Allowlists

DLP has two layers that work together:

Pattern authoring (this doc) governs what the agent sends — content-level inspection of the payload.
Destination allowlists govern where the agent sends — endpoint-level enforcement of the destination URL.

A request that contains a credit-card number going to an unapproved endpoint trips both layers. Either layer alone is meaningful enforcement; together they form defence in depth — a misconfigured allowlist won't leak content patterns, and a missing content pattern won't reach an unapproved destination.

API Reference

All DLP pattern and binding endpoints are mounted under /api/v1. Bearer-token auth is required throughout; see API Overview for the auth scheme.

Patterns

Method & Path	Purpose
`GET /api/v1/dlp/patterns`	List patterns visible to the tenant (filters: `pattern_type`, `category`, `is_builtin`, `search`).
`POST /api/v1/dlp/patterns`	Create a custom pattern. `pattern_type` must be `regex` or `string_list`.
`PUT /api/v1/dlp/patterns/{id}`	Update a custom pattern. Built-ins return `403 Forbidden`.
`DELETE /api/v1/dlp/patterns/{id}`	Delete a custom pattern. Built-ins return `403 Forbidden`. Existing bindings to the pattern are removed as well.

Bindings

Method & Path	Purpose
`GET /api/v1/policies/{policy_id}/dlp-bindings`	List bindings for a policy.
`POST /api/v1/policies/{policy_id}/dlp-bindings`	Create a binding. Body carries `pattern_id`, `direction`, `action`, `priority`, `enabled`.
`PUT /api/v1/policies/{policy_id}/dlp-bindings/{binding_id}`	Update binding fields.
`DELETE /api/v1/policies/{policy_id}/dlp-bindings/{binding_id}`	Remove the binding. The underlying pattern is untouched.

Full request / response schemas are in the API reference.

Best Practices

Start with the Built-in Catalogue

The built-in patterns cover ~80% of real-world PII and credential exfiltration. Bind them to your most permissive policies first with action=flag, watch a week of audit events, then promote the high-signal ones to redact or block.

Author Custom Patterns at the Tenant Level, Bind at the Policy Level

A pattern like internal_project_id is a tenant-wide fact. Author it once, then let individual policy owners decide whether it applies to their agents. Resist the temptation to fork the pattern per policy — use the binding settings to vary the behaviour.

Prefer `string_list` Over `regex` When You Can

string_list patterns can't be mis-authored into a ReDoS hazard, are easier for non-engineers to maintain, and are faster to evaluate at the edge. Reach for regex only when you need structural matching.

Use Priority to Resolve Overlap

When two patterns can match the same content (e.g. a custom regex and a built-in classifier), the higher-priority binding wins. Give high-confidence patterns higher priority so they shadow fuzzy ones — and so the recorded pattern_id on the audit event is the one that best describes the match.

Pair with Destination Allowlists

A pattern binding without a destination allowlist catches content but not destinations; an allowlist without pattern bindings catches destinations but not content. Both layers, on every policy, is the steady state.

Destination Allowlists — DLP destination layer
Threat Intelligence — Cross-tenant threat indicators
Injection Detection — Prompt injection detection
API Overview — Bearer-token auth and conventions