DLP Pattern Authoring
DLP pattern authoring is the content layer of MITRITY DLP — it governs what data is allowed to flow through your agents. It is the companion to destination allowlists, which form the destination layer of DLP — they govern where data can flow.
The Two-Layer Model
A DLP pattern is a reusable definition — a regex, a list of literal strings, or a built-in classifier. A pattern doesn't do anything on its own. To enforce, you attach the pattern to a policy with a binding, which carries the per-policy settings: direction, action, priority, enabled flag. The same pattern can be bound to many policies, each with different settings.
dlp_patterns (definitions)
│ M:N via bindings
▼
policy_dlp_bindings (per-policy: direction + action + priority + enabled)
│
▼
agents.policy_id → policies → bindings → patterns
This separation matters in practice: the security team curates a tenant-wide catalogue of patterns once, and individual policy owners decide which patterns apply to their agents and how strict the enforcement should be.
Where to Find It
Navigate to /app/policies, expand a policy row, and open the DLP sub-tab. The tab has two sections:
- Built-in patterns — the MITRITY-maintained catalogue covering common PII and credential formats.
- Custom patterns — patterns your tenant has authored. Editors and above can create, edit, and delete entries here.
Each section lists the bindings currently attached to this policy. Add a binding by selecting a pattern from the pattern picker; the binding inherits sensible defaults (direction both, action flag, priority 50) which you can tune inline.
Built-in Pattern Catalogue
MITRITY ships a curated set of detectors covering the most common PII and credential formats. These are maintained by MITRITY — customers cannot edit the underlying detection logic. You bind them and configure direction, action, and priority on the binding.
| Pattern | Type | Category | Detects |
|---|---|---|---|
ssn_us | classifier | pii | US SSNs with structural validation (area not 000/666/9XX; group not 00; serial not 0000) |
personnummer_se | classifier | pii | Swedish personal numbers (YYMMDD + 4 digits, real date, Luhn-style checksum) |
credit_card | classifier | pii | Credit cards 13–19 digits, Luhn-validated |
iban | classifier | pii | IBAN with mod-97 checksum |
email | regex | pii | RFC 5322 simplified |
phone | classifier | pii | E.164 + common national formats |
ipv4 | regex | pii | Dotted-quad IPv4 addresses |
aws_access_key | regex | credentials | AKIA[0-9A-Z]{16} |
github_token | regex | credentials | ghp_ / gho_ / ghu_ / ghs_ / ghr_ prefixes |
generic_api_key | classifier | credentials | Entropy + prefix heuristic. Low-confidence — recommended flag action only |
The classifier-backed built-ins do more than a regex match — they apply structural validation (Luhn, mod-97, real-date checks) to keep false positives manageable. The generic_api_key classifier is intentionally fuzzy; bind it with action=flag so you keep visibility without blocking legitimate traffic.
Custom Patterns
There are two types of customer-authored pattern:
regex
A single regular expression with an optional flags field. Only the i flag (case-insensitive) is accepted. Every regex is:
- Compile-tested at save time — invalid regexes are rejected with the underlying parse error.
- ReDoS-fuzzed with a 10 KB pathological input under a 100 ms budget. Patterns that time out are rejected before they can ever run in production.
{
"name": "internal_project_id",
"pattern_type": "regex",
"pattern_data": { "pattern": "PRJ-[A-Z0-9]{8}", "flags": "" },
"category": "internal"
}
string_list
A literal list of strings with an optional case_insensitive flag. No regex semantics — characters like ., *, and ? are matched literally. Best for competitor names, internal product codenames, legal-hold phrases, project keywords.
{
"name": "marketing_competitors",
"pattern_type": "string_list",
"pattern_data": { "strings": ["CompetitorA", "CompetitorB"], "case_insensitive": true },
"category": "competitor"
}
classifier
Reserved for MITRITY-maintained built-ins. POST or PUT with pattern_type=classifier returns 400 Bad Request.
Bindings — Direction, Action, Priority
A binding is the configurable attachment between a policy and a pattern. Four fields control its behaviour:
- direction — one of
inbound(data flowing into the agent, e.g. tool responses),outbound(data leaving the agent, e.g. tool arguments), orboth. Defaults toboth. - action — one of
block(deny the call),redact(replace each match with[REDACTED-<pattern_name>]before the payload is forwarded), orflag(audit-only, no payload modification). Defaults toflag. - priority — integer, higher evaluates first, first match wins. Defaults to 50. Use this to ensure high-confidence patterns shadow lower-confidence ones on the same payload.
- enabled — boolean. Disable a binding temporarily without deleting it.
Worked Example
The same marketing_competitors pattern, two different policies, two different behaviours:
- Marketing policy: bind with
direction=outbound, action=block. Agents can research competitors (tool responses come in freely) but cannot publish anything containing competitor names. - Support policy: bind with
direction=both, action=redact. Support replies containing competitor names get auto-redacted before they reach the customer; the same applies to incoming customer messages that mention competitors in case the agent quotes them back.
That reuse — one pattern definition, multiple policy bindings with different intent — is the value of the two-layer model.
Direction Inference
The Edge Node has to decide which side of an MCP exchange to inspect. It infers inbound vs outbound from the action type using a heuristic: action types whose name contains any of send, post, put, write, upload, push, publish, or forward classify as outbound. Everything else classifies as inbound.
This is edge behaviour, not part of the API contract — the heuristic may evolve as we observe more tool categories. If you're not sure how a particular tool action classifies, bind with direction=both and the binding fires on either side of the exchange.
GDPR-Safe Audit Contract
The rule is non-negotiable: DLP audit events never store the raw matched content. The matched substring is the most sensitive part of the request — it's literally the thing the policy is there to protect — so it must not be persisted in the audit store.
Every DLP audit event carries enough structural information to investigate a match without persisting the matched substring itself:
{
"id": "evt_...",
"tenant_id": "...",
"agent_id": "...",
"policy_id": "...",
"pattern_id": "ssn_us",
"pattern_name": "SSN (US)",
"direction": "outbound",
"action_taken": "blocked",
"detection_reason": "pattern_match",
"position": 247,
"length": 11,
"tool_pattern_matched": "drive:create",
"occurred_at": "2026-05-26T10:14:00Z"
}
No value, no matched_text, no other field carrying raw content. The boundary between the detection engine and the audit reporter strips any such field defensively — even if a future engine change accidentally tried to attach raw content, the reporter would drop it before persistence.
Mapping Binding Action to Edge Behaviour and Audit Outcome
| Binding action | Edge behaviour | action_taken value |
|---|---|---|
block | Tool call denied with JSON-RPC error returned to the caller | blocked |
redact | Match replaced with [REDACTED-<pattern_name>] before payload forwarded downstream | redacted |
flag | Match passes through unchanged; event recorded for review | flagged |
action_taken reuses the v1 vocabulary so legacy ingest paths (SIEM forwarders, downstream alerting) keep working without changes. The new detection_reason='pattern_match' value distinguishes v2 binding-driven matches from v1 classifier reasons (sensitive_exfil, volume_anomaly, accumulation, unauthorized_destination), which continue to fire from the existing detection paths.
Coexistence with Destination Allowlists
DLP has two layers that work together:
- Pattern authoring (this doc) governs what the agent sends — content-level inspection of the payload.
- Destination allowlists govern where the agent sends — endpoint-level enforcement of the destination URL.
A request that contains a credit-card number going to an unapproved endpoint trips both layers. Either layer alone is meaningful enforcement; together they form defence in depth — a misconfigured allowlist won't leak content patterns, and a missing content pattern won't reach an unapproved destination.
API Reference
All DLP pattern and binding endpoints are mounted under /api/v1. Bearer-token auth is required throughout; see API Overview for the auth scheme.
Patterns
| Method & Path | Purpose |
|---|---|
GET /api/v1/dlp/patterns | List patterns visible to the tenant (filters: pattern_type, category, is_builtin, search). |
POST /api/v1/dlp/patterns | Create a custom pattern. pattern_type must be regex or string_list. |
PUT /api/v1/dlp/patterns/{id} | Update a custom pattern. Built-ins return 403 Forbidden. |
DELETE /api/v1/dlp/patterns/{id} | Delete a custom pattern. Built-ins return 403 Forbidden. Existing bindings to the pattern are removed as well. |
Bindings
| Method & Path | Purpose |
|---|---|
GET /api/v1/policies/{policy_id}/dlp-bindings | List bindings for a policy. |
POST /api/v1/policies/{policy_id}/dlp-bindings | Create a binding. Body carries pattern_id, direction, action, priority, enabled. |
PUT /api/v1/policies/{policy_id}/dlp-bindings/{binding_id} | Update binding fields. |
DELETE /api/v1/policies/{policy_id}/dlp-bindings/{binding_id} | Remove the binding. The underlying pattern is untouched. |
Full request / response schemas are in the API reference.
Best Practices
Start with the Built-in Catalogue
The built-in patterns cover ~80% of real-world PII and credential exfiltration. Bind them to your most permissive policies first with action=flag, watch a week of audit events, then promote the high-signal ones to redact or block.
Author Custom Patterns at the Tenant Level, Bind at the Policy Level
A pattern like internal_project_id is a tenant-wide fact. Author it once, then let individual policy owners decide whether it applies to their agents. Resist the temptation to fork the pattern per policy — use the binding settings to vary the behaviour.
Prefer string_list Over regex When You Can
string_list patterns can't be mis-authored into a ReDoS hazard, are easier for non-engineers to maintain, and are faster to evaluate at the edge. Reach for regex only when you need structural matching.
Use Priority to Resolve Overlap
When two patterns can match the same content (e.g. a custom regex and a built-in classifier), the higher-priority binding wins. Give high-confidence patterns higher priority so they shadow fuzzy ones — and so the recorded pattern_id on the audit event is the one that best describes the match.
Pair with Destination Allowlists
A pattern binding without a destination allowlist catches content but not destinations; an allowlist without pattern bindings catches destinations but not content. Both layers, on every policy, is the steady state.
Related Documentation
- Destination Allowlists — DLP destination layer
- Threat Intelligence — Cross-tenant threat indicators
- Injection Detection — Prompt injection detection
- API Overview — Bearer-token auth and conventions