Now in beta — 20 spots remaining

Know when your AI agent
starts going off-script

Agent Drift Detector monitors your AI agent's output quality over time and fires an alert before bad outputs damage your business. Built for operators — not ML engineers.

$49/mo per agent
87 avg health score
18% avg drift detected
Customer Support Agent
87
Health Score
All systems normal Last checked: 2 hours ago
30-Day Drift Trend
Health Score
Baseline (83)

AI agents silently degrade.
You usually find out too late.

Base model updates, data staleness, and prompt drift cause agents to produce worse outputs over days or weeks — without any warning.

Silent Degradation

Agents produce increasingly worse outputs with no indication anything changed. You only notice when customers complain or leads go cold.

No Quality Baseline

Without a reference for what "good" looks like, you have no way to measure whether today's output is as good as last week's.

Tools Built for Engineers

Existing observability tools (Arize, LangSmith, Phoenix) require ML expertise. The non-technical operator has no product built for them.

Three steps to agent clarity

No ML expertise required. Just tell us what good looks like, and we'll watch for problems.

1

Define Your Ground Truth

Submit 3–5 examples of what a good output looks like for your agent. That's your reference standard — no training data needed.

2

Connect via Webhook

Point your agent's output stream to Drift Detector with a single webhook URL. Or paste outputs manually for testing.

3

Get Drift Alerts

We score every output against your ground truth. When quality drops more than 15%, you get an email alert before damage is done.

Actionable alerts, not noise

Every alert tells you what changed, why it matters, and what to do next.

Agent Drift Detected — Content Agent
driftdetector.ai • 2 hours ago

Your Content Agent's health score dropped 18% below baseline. This is the largest single-day drop in 14 days.

83
Baseline score
68
Current score
  • Semantic similarity dropped 22% — agent is using different phrasing and tone than ground truth examples
  • Keyword overlap with knowledge base dropped 15% — agent may be referencing outdated facts
  • Structure compliance dropped to 60% — agent missing the step-by-step format from ground truth

Simple pricing. No surprises.

Start free with beta access. Upgrade when you're ready.

Starter
Perfect for one agent monitoring
$49 / month
  • 1 AI agent monitored
  • 100 outputs/month
  • Email drift alerts
  • Weekly digest
  • Ground truth editor
  • Slack alerts
Beta users get 3 months free, then $29/mo intro rate → $49/mo at 90 days

Common questions

How is "drift" different from an error?

An error means the agent failed outright — a crash, a timeout, a bad response code. Drift is subtler: the agent is still working, still producing outputs, but those outputs are progressively worse than your reference standard. Drift is what happens before errors become obvious to your customers.

Do I need to know what "ground truth" means?

No. Ground truth just means "examples of good outputs." You paste in 3–5 past outputs that you're happy with, and Drift Detector uses those as the reference. If new outputs diverge significantly, you get an alert. That's it — no training, no ML expertise needed.

Which AI agents are supported?

Any agent that can send a webhook or HTTP POST. We support OpenAI, Anthropic, Google Gemini, and custom agents via webhook. Our scoring algorithm is platform-agnostic — we score the output text, not the model or provider.

What's the scoring algorithm?

We use three components: (1) Semantic similarity — does the new output mean the same thing as your ground truth? (40% weight); (2) Keyword/fact overlap — are the right entities and facts present? (35%); (3) Structure compliance — does it follow the same format? (25%). We compute a composite score from 0–100. An alert fires when the score drops more than 15% below your rolling 7-day baseline.

What if the alert is a false positive?

You can mark alerts as "false positive" directly from the email or dashboard. This feedback improves your personal threshold calibration over time. You can also manually tune the sensitivity from 10% to 25% in settings — more sensitive catches drift earlier, less sensitive reduces noise.

How is this different from Arize or LangSmith?

Arize and LangSmith are built for ML engineers and dev teams — they require tracing setup, span configuration, and ML knowledge to interpret. Drift Detector is built for the business owner or operator running an AI agent who doesn't know what a "span" is. We score the business outcome (output quality), not the technical signal.

Your agent was fine last week.
Is it still?

Join 20 beta users monitoring their agents right now. Free for 3 months.

You're on the list! We'll be in touch soon.

Agent Dashboard

Monitoring 1 agent • Last checked 2 hours ago

Active Agent
Customer Support Agent
5 outputs scored today
OpenAI GPT-4o
Current Health Score
87
Health Score
Baseline:
83
Healthy
Webhook URL
https://api.driftdetector.ai/hook/abc123

Send agent outputs to this URL via POST. Include {"output": "your text"} in the body.

30-Day Drift Trend
Avg score: 82.4
Drift events: 3 this month
Outputs today: 5
Recent Scored Outputs
Time Output Preview Score Status
Alert History 3 alerts
Ground Truth Examples
3 of 5 examples
Refresh examples monthly for accurate scoring