Affiliate reporting looks calm — until it isn’t. One Tuesday morning the GEO split looks normal; the next, a single sub-affiliate’s CR jumps from 3.4% to 19%, EPC quadruples, and by the time anyone notices, two days of payouts have already been approved against fraudulent traffic. The cost of finding out late is real money, and the antidote is not “look at dashboards more often.” It is anomaly detection wired directly into your reporting stack, with alerts that ping the right person inside the first hour of weird behavior.
This guide walks through what to monitor, which detection methods actually work for affiliate data, how to set up alerts without drowning your team in noise, and the small operational habits that turn alerts from a notification firehose into a managed risk system.
Why Static Thresholds Almost Always Fail
The first instinct when a manager says “alert me when something is wrong” is to write a static rule: if EPC > $5, page me. This works for about two weeks. Then a new GEO opens, a high-converting offer is launched, a regulated vertical seasonally spikes — and the static threshold either fires twenty times a day or stops firing at all. Static rules treat affiliate data as if it were stationary, but the underlying distribution moves constantly. Volume scales with media budget. CR moves with creative fatigue. Approval rate shifts when an advertiser tightens validation. A rule tuned to last month’s median is wrong this month by definition.
The deeper issue is that affiliate metrics are not independent. A 40% CR drop on a single offer might be normal (the advertiser paused a landing page), or it might be catastrophic (a tracking pixel broke at 03:00 UTC). Static thresholds can’t tell the difference because they don’t carry context — they don’t know what was true yesterday at the same hour, what the seven-day baseline looks like, or whether the same drop is happening on neighboring offers. The job is not to detect “high” or “low” values; it is to detect deviations from expected behavior given context. That requires statistics, not just thresholds.
Even when teams know this, they tend to over-engineer the response by reaching for machine-learning libraries on day one. You do not need an LSTM to find a broken postback. You need a baseline, a band around that baseline, and a rule that distinguishes a real outlier from normal noise. Start simple, prove value, then add sophistication only where the simple version is consistently wrong.
The Metrics That Actually Need Alerts
Not every metric deserves an alert. The fastest way to burn out an analytics team is to wire detection to anything that moves. Focus on the metrics where a sudden change either signals fraud, a tracking break, or an advertiser-side problem you need to escalate before payouts close.
The core set for most affiliate programs is short: clicks per affiliate per hour, conversion rate (CR) per offer and per affiliate, approval rate from the advertiser, EPC by sub-affiliate, payout-to-revenue ratio per offer, and traffic source distribution by referrer or device. Add chargeback rate, scrub rate, and duplicate-conversion rate if your verticals are sensitive (rebill, financial services, app installs). Each of these has a known failure mode: clicks spike when a bot wave hits, CR drops when a pixel breaks, approval rate falls when the advertiser starts scrubbing, EPC explodes when a low-volume affiliate suddenly drives qualified leads (which is sometimes a great win and sometimes credential-stuffing).
The second principle is granularity. Alerts on aggregate, network-wide CR are nearly useless — by the time the average moves, individual partners have been broken for hours. Alert on the smallest unit you can act on: offer, sub-affiliate ID, GEO, traffic source, hour. The dimensionality multiplies fast, so you compensate with smarter aggregation rules (only alert when a deviation is large and the partner has enough volume to be statistically meaningful — typically minimum 100 clicks or 30 conversions in the window).
Finally, separate “operational” alerts from “business” alerts. Operational alerts (tracking down, postback queue stalled, sudden zero-traffic from a top affiliate) need to page on-call immediately. Business alerts (CR drift, approval rate trend, EPC anomalies) should land in a daily or hourly digest where an account manager can investigate without interrupt-driven fatigue.
Detection Methods That Work on Real Affiliate Data
For the metrics above, three families of methods cover roughly 90% of useful detection: rolling z-score, seasonal decomposition with residual bounds, and rate-of-change detection.
Rolling z-score is the workhorse. You compute the mean and standard deviation of a metric over a trailing window (typically 7 or 14 days at the same hour-of-day to respect daily seasonality), then flag values where the current observation is more than 3 standard deviations from that mean. It is dead simple in SQL or pandas, handles slow drift naturally (the window moves forward), and is robust enough that you can deploy it in an afternoon. Its weakness is that it assumes roughly Gaussian behavior; for sparse data (sub-affiliates with low volume), it produces false positives unless you add a minimum-volume gate.
Seasonal decomposition matters because affiliate traffic is intensely seasonal — hourly, daily, and weekly. STL decomposition (or the simpler Prophet model, which Facebook open-sourced and which works without much tuning) splits a time series into trend, seasonal component, and residual. You alert on the residual, not the raw value. This dramatically reduces false positives around predictable patterns like Sunday-night traffic dips or weekday morning spikes. For most networks, weekly Prophet retraining is enough; daily is overkill.
Rate-of-change detection is the cheap but critical layer for tracking-break scenarios. If a metric drops to zero or doubles within a single 15-minute bucket against a baseline that suggests gradual change, page someone — this is almost always a postback failure, S2S endpoint outage, or domain block. Don’t even try to model this with statistics; a simple “if value drops below 10% of the trailing 4-hour median, alert immediately” rule will catch nearly every real outage with very few false positives.
For fraud-style detection (sudden CR spikes, suspicious sub-affiliate behavior), add a “compared to peer group” check: a sub-affiliate’s CR should be flagged not just against its own history but against the median of all sub-affiliates on the same offer in the same GEO. A 20% CR is fine if everyone else is at 18%; the same 20% is a red flag if peers sit at 4%.
Where to Build It: BigQuery, Looker, or a Dedicated Tool
Most affiliate programs already have their reporting in either a data warehouse (BigQuery, Snowflake, ClickHouse) or directly inside a tracking platform (Voluum, Affise, Everflow, HasOffers). Where you build detection should follow where your data already lives, not where the fanciest tools sit.
If you have a warehouse, write detection as scheduled SQL. A nightly query that computes rolling z-scores per offer × sub-affiliate × GEO, writes flagged rows to an `alerts` table, and a small Python or Cloud Function reader that posts to Slack — that’s the whole system. It costs almost nothing, lives next to your data, and survives team turnover because it’s just SQL. BigQuery scheduled queries plus a Slack webhook is a production-grade setup that takes one engineer about two days to ship.
If your data lives in the tracking platform and you can’t easily extract it, most modern affiliate platforms (Affise, Everflow, Tune) expose webhook events and a reporting API. You can set up a thin Python service that polls the API every 15 minutes, computes deviations in pandas, and emits alerts. It’s less elegant than warehouse-native detection but works fine for networks under ~10M clicks per month.
Dedicated anomaly detection products (Anodot, Outlier.ai, Metaplane, Datadog’s anomaly monitors) become worth it when you have more than ~20 metrics × hundreds of dimensions to monitor and a team that does not want to maintain detection code. They handle seasonality, drift, and alert grouping out of the box. The trade-off is cost (often four-figure monthly) and the usual SaaS lock-in. For most affiliate operations under $5M annual revenue, the DIY warehouse approach wins on cost-per-detection.
Looker Studio (or Looker, Tableau, Metabase, Power BI) is good for visualizing anomalies once detected but is a poor primary detection layer — dashboards don’t run when no one is looking at them. Use BI tools as the investigation surface that an alert links to (“CR anomaly detected on offer 4421 — open dashboard”), not as the alerting layer itself.
Designing Alerts That Actually Get Acted On
The hardest part of anomaly detection is not the math — it’s the operational design that keeps alerts useful month after month. Most teams ship detection, get great signal for two weeks, then quietly start ignoring alerts as noise creeps in. A handful of principles prevent that decay.
First, every alert must include enough context to triage in under 30 seconds: which dimension fired (offer + sub-affiliate + GEO + hour), the current value, the baseline it deviated from, the magnitude of the deviation in standard deviations or percent, and a direct link to a Looker or Metabase dashboard prefiltered to the affected slice. An alert that just says “CR anomaly on offer 4421” forces the recipient to open a tool and investigate from scratch — that is the moment alerts start getting ignored.
Second, route by severity, not by metric. Operational alerts that mean “money is being lost right now” go to PagerDuty or an on-call Slack channel. Business alerts go to a dedicated `#affiliate-anomalies` channel that account managers check during the workday. A daily digest aggregates lower-priority deviations into a single morning summary. Mixing all three together is what burns teams out.
Third, build feedback loops. Every alert needs a “true positive / false positive” reaction (a Slack emoji works fine). Once a week, review the false positive list and tune: tighten thresholds, raise minimum-volume gates, exclude known-volatile partners, add suppressions for scheduled advertiser maintenance. Without this loop, the same false positive fires every day and the team learns to mute the channel.
Fourth, suppress correlated alerts. If a tracking outage takes down 40 offers simultaneously, you want one parent alert, not 40 child alerts. Most alerting platforms support grouping; if you’re rolling your own, a 15-minute deduplication window keyed on the root cause (advertiser, tracker, GEO) is usually enough.
A Practical 30-Day Implementation Plan
Week one: inventory. List every metric you currently report on, mark which ones have a real action attached when they move, and discard the rest. The output is a short list — usually six to ten metrics — that genuinely deserve detection. Decide where your data lives, who owns the alerting channel, and who is on-call for tracking outages.
Week two: ship the simplest possible detection for two or three of the highest-priority metrics. Rolling 7-day z-score, minimum volume gate, Slack webhook. Don’t try to be clever. Run it in shadow mode for the first few days — log alerts to a private channel only, review them yourself, tune thresholds until the false positive rate is under 30%.
Week three: turn on alerts for the team and add a feedback reaction workflow. Add rate-of-change detection for the operational metrics (anything where “value drops to zero” is meaningful). Wire on-call routing for the operational class. Start a weekly review meeting — fifteen minutes, just to look at false positives and the alerts that fired.
Week four: expand coverage. Add seasonal decomposition (Prophet) for the metrics where z-score is fighting strong daily or weekly patterns. Add peer-group comparison for fraud-relevant metrics like CR by sub-affiliate. Document the runbook: who responds to which alert class, what the first three diagnostic steps are, and how to suppress during known maintenance windows.
At the end of thirty days you should have a detection system that catches roughly 80% of meaningful anomalies within the first hour of occurrence, a team that trusts the alerts, and a documented review cadence that keeps the system healthy as your data and partner mix change. The remaining 20% — the subtle, slow-moving fraud patterns and creative coordination problems — is where dedicated tools and more sophisticated models start to pay off. But you should never reach for those until the simple version is running, trusted, and acted on.