Blog
Detection Engineering

Setting Anomaly Detection Thresholds Without Drowning in Alerts

Alert fatigue is the silent killer of API security programs. How to tune behavioral baselines so your team responds to real threats, not noise.

R
Reena Solberg
Head of Product, Hirefathom
Anomaly Detection Thresholds article cover

Alert fatigue is a well-understood problem in security operations, and behavioral anomaly detection has a specific version of it that's worth treating separately from the general SIEM noise problem. The general noise problem is usually about volume — too many alerts from too many sources, with too little context to prioritize. The behavioral anomaly detection version of alert fatigue is about threshold misconfiguration: systems that are technically working as designed but generating signals that don't correspond to real threats at the rate needed to make the alerts actionable.

If your anomaly detection is firing 400 alerts per day on your API traffic and your security team can act on 20 of them, you haven't built a detection capability — you've built a noise generator. Your team learns to treat the queue as a stream of low-signal events and develops heuristics for filtering it down to something manageable. The real threats in that queue don't get triaged faster; they get treated with the same skepticism as the 380 false positives surrounding them. This is worse than having no anomaly detection, because it creates the illusion of coverage while degrading your team's response effectiveness.

Getting threshold configuration right requires understanding what behavioral baselines actually are, how they degrade, and what operational choices create the conditions for useful signals versus constant noise.

What a Behavioral Baseline Is (and What It Isn't)

A behavioral baseline for an API endpoint is a statistical model of what normal traffic to that endpoint looks like. It captures distributions — not fixed values. A baseline for GET /api/catalog/search might describe: request arrival rate follows a diurnal curve peaking in afternoon hours with variance X; query parameter combinations cluster in 12 dominant patterns; response latency follows a log-normal distribution with a median of 180ms; 95th percentile response payload size is 42KB; caller token population is 3,200 distinct active users per rolling 24-hour window.

A useful baseline is not a single threshold. "More than 500 requests per minute is anomalous" is not a behavioral baseline — it's a static threshold. A behavioral baseline adapts. Monday morning at 9 AM might see 800 requests per minute to that endpoint; Tuesday night at 2 AM might see 40. An alert that fires when traffic exceeds 500 requests per minute will miss real anomalies during the day (when 800 is normal) and generate false positives at night (when 200 is unusual but normal for that hour). Static thresholds applied to dynamic traffic patterns are almost always misconfigured in at least half of the time windows they cover.

The baseline needs to capture the traffic patterns that are normal for your specific endpoint, including temporal periodicity (daily cycles, weekly patterns, seasonal effects), correlation with external events (your marketing team ran a campaign, your mobile app pushed a notification), and endpoint-specific caller populations (this endpoint is called only by mobile clients; this other one is an internal service-to-service call with highly regular timing).

The Observation Window Problem

Behavioral baselines are only as good as the data they're built on. A baseline trained on one week of traffic will miss anything that doesn't recur within a week — weekend traffic patterns, end-of-month billing spikes, seasonal variations. A baseline trained on three months of traffic is more robust but may be stale if your product has grown significantly in that time; the traffic patterns from a service at 10,000 daily active users are structurally different from the same service at 100,000.

The practical guidance: baselines should use adaptive windows that weight recent traffic more heavily than older traffic (exponentially weighted moving averages, not simple rolling averages), while retaining enough historical signal to capture weekly and monthly periodicity. A baseline model that updates every 24 hours with a decay factor that gives the last 7 days roughly 70% of the weight is a reasonable starting point for most API endpoints. Endpoints with strong seasonal patterns need longer windows or explicit seasonal decomposition.

New endpoints need a cold-start strategy. You can't build a meaningful behavioral baseline on zero traffic. Options include: start in observe-only mode for a defined period before enabling alerting (typically 72–96 hours for endpoints with regular traffic); bootstrap the baseline from similar endpoints with the same traffic class; or use loose thresholds (high z-score cutoffs) initially and tighten them as the baseline stabilizes. Alerting aggressively on a new endpoint's first week of traffic is a reliable way to create the first wave of false positives that starts eroding your team's trust in the detection system.

Tuning for Signal-to-Noise: Practical Mechanics

The most common threshold misconfiguration is setting anomaly sensitivity too high globally and then dealing with the resulting noise. The better approach is to set sensitivity per endpoint class and tune from actual observed alert quality, not from assumptions about what sensitivity level feels right.

Start by segmenting your API endpoints by traffic class and security relevance:

  • High-sensitivity, high-security endpoints: authentication, payment processing, account modification, data export. These warrant tight baselines (lower z-score thresholds), faster alerting, and human triage on every alert. Volume should be low enough to maintain this.
  • Medium-sensitivity endpoints: user data reads, search, core product functionality. Standard baseline sensitivity with automated enrichment before alerting (is this IP in a known residential proxy pool? is this session showing other anomalies?). Human triage on the enriched subset.
  • Low-sensitivity, high-volume endpoints: public data, status checks, catalog browsing. Coarse baselines, high z-score thresholds (alert only on large deviations), aggregate monitoring rather than per-request tracking.

Within each class, the right threshold calibration comes from empirical feedback. Track two metrics consistently: false positive rate (alerts that investigation confirms are legitimate traffic) and false negative rate (confirmed attacks that didn't generate alerts before detection through other means). The goal isn't to minimize false positives at the expense of coverage — it's to find the threshold that gives your team an alert queue small enough to triage fully and a detection rate high enough to catch real threats within a useful timeframe.

When Legitimate Traffic Looks Like an Attack

Some of the most persistent false positive sources in API behavioral monitoring come from legitimate traffic that genuinely looks anomalous:

Batch jobs and automated integrations. An internal cron job that runs at 3 AM and makes 50,000 calls to an API endpoint in 20 minutes will look like an attack to any anomaly detection system that doesn't know the job exists. Automated integrations from partners with predictable but high-volume call patterns hit the same problem. The fix is service account labeling — these callers should be tagged in your baseline model as "scheduled automation" and their traffic modeled separately from interactive user traffic. Their anomaly model is different: you want to alert if the job doesn't run, or if its call pattern changes unexpectedly, not if it runs as scheduled.

Promotional spikes. A marketing campaign that drives 10x traffic to your product catalog API in an hour is not an attack, but it will generate rate-anomaly alerts in any system that doesn't know the campaign is running. Traffic context feeds — a lightweight API or annotation system that lets your marketing or growth team signal upcoming campaigns to your security tooling — dramatically reduce this class of false positive.

Mobile app pushes and client-side caching changes. A mobile app update that changes client-side caching behavior can cause a traffic pattern that looks like a sudden demand spike. A/B test rollouts can split traffic in ways that create artificially anomalous segments. These are engineering-domain events that behavioral security tooling needs to be aware of to avoid treating them as threat signals.

The Alert Queue as a Feedback Signal

We're not suggesting that perfect threshold configuration eliminates the need for ongoing tuning. It doesn't. API traffic patterns evolve continuously — new features ship, user behavior changes, integrations are added. A threshold configuration that was well-tuned three months ago will have drifted toward either over-alerting or under-coverage as the underlying traffic patterns changed.

The right operational model treats your alert queue as a continuous feedback signal. Every investigation that concludes "this is legitimate traffic" should generate a tuning action: either adjust the baseline for this traffic pattern, label the caller as a known automated source, or raise the sensitivity threshold for this endpoint class. Every confirmed threat that wasn't alerted should generate the inverse: tighten the threshold, add a correlated signal, or increase the monitoring intensity on endpoints showing similar patterns.

The teams that achieve high-quality behavioral detection aren't the ones that found the right threshold values upfront. They're the ones that built feedback loops between their investigation workflows and their detection configuration — treating threshold tuning as an ongoing operational practice, not a one-time setup task. The baseline is never done. Neither is the tuning.