Real-Time Behavioral Drift Detection for Autonomous Agents

Static policy rules catch known-bad actions. But the most dangerous agent behaviors are not on any blocklist — they are novel sequences of individually legitimate actions that, taken together, constitute a policy violation. Detecting this requires understanding what an agent normally does and recognizing when it deviates.

This post describes the machine learning pipeline behind MITRITY's behavioral drift detection system: how we achieve sub-millisecond inference at the edge, what models run where, and how explainability is built into every alert.

What Is Behavioral Drift?

Every AI agent develops a behavioral fingerprint over time. A customer support agent typically queries the ticket database, reads customer profiles, generates responses, and occasionally escalates to a human. It does these things in predictable patterns — certain tools at certain frequencies, in certain sequences, during certain hours.

Behavioral drift is when an agent's actions deviate from this established baseline. Not a single anomalous action, but a shift in the overall pattern. The support agent that starts accessing financial records, or the coding agent that begins making network requests to unfamiliar endpoints, or the data pipeline agent that suddenly writes to a customer-facing database instead of the analytics warehouse.

The challenge is distinguishing genuine drift (the agent is doing something it should not) from legitimate evolution (the agent's scope has been expanded, or a new workflow requires new tools). This is where ML-driven governance differs from static rules: the system learns what normal looks like and adapts as normal changes.

Two-Tier Architecture

MITRITY's detection system operates on two tiers, optimized for different latency and accuracy requirements.

Tier 1: Edge (Mitrity Edge)

Mitrity Edge is a lightweight binary (~2MB) deployed alongside your agents. It intercepts every agent action and must make an allow/block decision before the action executes. The latency budget for this decision is under 0.5 milliseconds — anything slower would degrade agent performance.

At this tier, we deploy DriftGuard — a Temporal Convolutional Network (TCN) compiled to ONNX format. DriftGuard processes the recent action sequence — the last N actions the agent has taken — and outputs an anomaly score. If the score exceeds the configured threshold, the action is blocked immediately.

Why a TCN architecture instead of an LSTM or Transformer? Three reasons:

Parallelizable inference. TCNs use dilated causal convolutions, which means the entire sequence can be processed in a single forward pass without sequential dependencies. LSTMs process tokens one at a time. For a 128-action sequence, DriftGuard is an order of magnitude faster.
Fixed memory footprint. The model size is approximately 2MB in ONNX format. It loads once and runs without dynamic memory allocation. This is critical for edge deployment where Mitrity Edge shares resources with the agent runtime.
Causal architecture. Dilated causal convolutions ensure the model only looks at past actions when evaluating the current one — it cannot "peek" at future actions. This is a natural fit for real-time stream processing where you evaluate each action as it arrives.

DriftGuard is trained on per-agent behavioral data. Each agent gets a model fine-tuned to its specific patterns. When a new model version is available (after retraining on recent data), the Edge Node receives it via the heartbeat channel and hot-swaps it with zero downtime — no restart, no gap in coverage.

Tier 2: Centralized (Control Plane)

The control plane runs deeper analysis on a longer time horizon. It receives action streams from all Edge Nodes and applies three additional models:

DeepTrace for long-range sequence analysis. While DriftGuard excels at short sequences (last 128 actions), some attack patterns unfold over hours or days. An agent that gradually escalates its access over a week — each day accessing slightly more sensitive resources — will not trigger DriftGuard because each individual session looks normal. DeepTrace (LSTM-based) maintains a longer memory window and detects these slow-burn patterns.

TrustGraph for relationship-based threats. Not all threats are visible in action sequences. Privilege escalation through service account chains, lateral movement across resource boundaries, and anomalous relationships between agents and resources are graph-structural problems. MITRITY builds a heterogeneous graph with four node types — agents, users, resources, and roles — and trains TrustGraph (a Graph Neural Network) to detect anomalous edges and subgraphs. When an agent accesses a resource through a path it has never used before, or when two agents that have never interacted start sharing an intermediate resource, TrustGraph flags it.

ColdStart for new agents. New agents have no behavioral history. Without a baseline, DriftGuard and DeepTrace cannot detect drift because there is nothing to drift from. MITRITY solves this with ColdStart — a Variational Autoencoder trained on behavioral data from all agents across all tenants. ColdStart learns a latent representation of "normal agent behavior" that transfers to new agents, providing useful anomaly detection from the first action. As the agent accumulates its own history, the per-agent DriftGuard model takes over and the ColdStart contribution fades.

The Inference Pipeline

Here is how an action flows through the system:

Agent initiates action. The agent calls a tool, makes an API request, or queries a database.
Edge Node intercepts. Mitrity Edge captures the action metadata: action type, target resource, timestamp, parameters (hashed, not raw values), and the agent's recent action sequence.
DriftGuard inference (< 0.5ms). The ONNX-compiled DriftGuard model processes the action sequence and outputs an anomaly score between 0 and 1.
Policy evaluation. The anomaly score is combined with static policy rules (blocklists, rate limits, resource restrictions) to produce a final decision: allow, block, or escalate.
Decision returned. The Edge Node returns the decision to the agent runtime. If blocked, the agent receives an error indicating the action was denied with a reason code.
Async forwarding. Regardless of the decision, the action metadata is forwarded asynchronously to the control plane for deeper analysis by DeepTrace and TrustGraph. This does not affect the latency of the inline decision.
Control plane analysis. DeepTrace and TrustGraph process the action in the context of longer history and cross-agent relationships. If they detect a threat that DriftGuard missed, they generate an alert and can push updated policy to the Edge Node for future blocking.

Behavioral Hashing

Raw action data is privacy-sensitive and high-volume. Sending every parameter of every action to the control plane would be both a privacy risk and a bandwidth problem. MITRITY uses behavioral hashing to compress action sequences into fixed-size representations that preserve behavioral patterns while discarding sensitive content.

Each action is hashed into a behavioral vector that encodes: the action type, the resource category (not the specific resource), the temporal position in the sequence, and the deviation from the agent's baseline for this action type. The hash is deterministic — the same behavioral pattern always produces the same hash — but not reversible. You cannot reconstruct the original action from the hash.

This allows TrustGraph and DeepTrace to operate on behavioral patterns without ever seeing raw customer data. It also enables cross-tenant threat intelligence: if the same behavioral hash appears across multiple tenants and is associated with confirmed threats, it can be added to a shared indicator set without exposing any tenant's data.

Explainable Alerts

An anomaly score without explanation is useless in practice. Security teams need to know why an action was flagged, not just that it was flagged. MITRITY provides two layers of explainability:

SHAP Attribution

Every alert includes SHAP (SHapley Additive exPlanations) values for each feature that contributed to the anomaly score. These values show exactly which aspects of the action were anomalous:

Temporal features: Was this action taken at an unusual time? Was the interval between actions abnormal?
Sequence features: Does this action follow an unusual predecessor? Is this sequence of actions novel for this agent?
Resource features: Is the target resource outside the agent's normal access pattern? Is the resource sensitivity level higher than usual?
Frequency features: Is the rate of this action type above the agent's baseline? Are there burst patterns?

SHAP values are computed per-feature, so the security team can see that an alert fired because "resource sensitivity was 3.2 standard deviations above baseline" rather than just "anomaly score = 0.87."

LLM-Generated Investigation Narratives

For high-severity alerts, MITRITY generates a natural-language investigation narrative using Claude Haiku. The narrative synthesizes the SHAP attribution, the action sequence, the agent's historical behavior, and the policy context into a readable summary:

"Agent infra-optimizer was blocked from terminating instance prod-db-primary at 14:32 UTC. This action was flagged because: (1) the agent has never terminated database instances before — its history shows only compute instance management; (2) the action occurred during a burst of 12 termination requests in 30 seconds, compared to its baseline of 1-2 per hour; (3) the target is a production database, which is classified as a critical resource. Recommended action: review the agent's task assignment and verify whether database management was recently added to its scope."

This is not a generic template. It is generated from the specific features of the alert and adapts to the context. Security engineers get actionable investigation starting points, not raw numbers.

Model Lifecycle

Models are not static. Agent behavior evolves as workflows change, new tools are added, and business requirements shift. MITRITY's model lifecycle handles this automatically:

Continuous data collection. Every action (hashed) is stored in the training pipeline.
Periodic retraining. Per-agent DriftGuard models are retrained on a configurable schedule (default: weekly). ColdStart and TrustGraph are retrained daily across all tenant data (using only behavioral hashes — no raw data).
Validation. New models are validated against a holdout set of known-good and known-bad sequences. A model is only promoted if it maintains or improves detection accuracy.
Deployment. Updated DriftGuard models are pushed to Edge Nodes via the heartbeat channel. The Edge Node loads the new model in memory and swaps it atomically — the old model handles requests until the new one is ready.
Feedback loop. When security teams confirm or dismiss alerts, that signal feeds back into the training data, improving future model accuracy.

Performance Characteristics

Metric	Value
DriftGuard inference latency (p99)	< 0.5ms
DriftGuard model size (ONNX)	~2MB
Action sequence window (Edge Node)	128 actions
DeepTrace sequence window (Control Plane)	10,000 actions
TrustGraph graph refresh interval	5 minutes
Model hot-swap downtime	0ms
False positive rate (after 7-day baseline)	< 2%

What This Means in Practice

Behavioral drift detection is not a feature you configure and forget. It is a continuous learning system that adapts to your agents as they evolve. DriftGuard gives you sub-millisecond inline protection. DeepTrace catches slow-burn attacks. TrustGraph detects structural threats that no sequence model can see. And the Narrator explainability layer ensures your security team can act on alerts without reverse-engineering ML outputs.

The result: your agents operate at full speed, and your governance operates at the same speed.

Want to see behavioral drift detection in action? Start a free trial or read the Edge Node deployment guide to deploy your first governed agent in minutes.