RAD (Record-level Anomaly Detection)

This page explains Record-level Anomaly Detection (RAD) and Soda's anomaly detection capabilities through RAD.

Coming soon!

RAD functionalities will be available soon for Enterprise plan users.

What is RAD?

Ensuring data quality can be difficult, especially when you need broad coverage quickly. Checks and column monitors are great for enforcing specific rules, but they take time to set up and require a deep understanding of your data. Soda’s Record-level Anomaly Detection (RAD) helps you get started fast, providing instant coverage across all columns, rows, and segments, without any configuration.

The algorithm analyzes historical data to build a clear picture of what normal data is supposed to look like. When incoming rows show unusual patterns, unexpected values, inconsistencies, or errors, RAD automatically triggers an alert and runs a Root Cause Analysis to pinpoint the issue. This provides quick, actionable insights while you work toward more detailed control using checks and column monitors.

Why use RAD?

Instant, broad coverage Monitor all columns, rows, and segments at once, detecting both known and unknown issues.
No configuration needed Get started immediately: no metrics or checks need to be defined. RAD automatically determines which columns to use.
One metric to track and alert on The Record-level Drift Score provides a single, explainable metric to monitor data health.
- Assess the impact of data quality issues: easily determine how many rows in your dataset are affected by data quality problems.
- Prioritize what matters: use the Record-level Drift Score consistently across datasets and data sources to rank and focus on the most critical data quality issues.
- Reduce false alerts: traditional column-level monitoring increases the risk of false positives with every additional monitor. With RAD, you only need one anomaly detection monitor per dataset, minimizing noise.
- Optimize compute usage: monitoring a single metric per dataset lowers computational overhead. Additionally, RAD can work with sampled data, further reducing processing demands.
Built‑in root cause analysis Quickly understand what changed and why.
Native support for backfilling and back‑testing Automatically generate and assess historical Record-level Drift Scores to review past data quality trends.

When to use RAD?

Order of operations to achieve the best coverage in the most efficient way:

Firstly, dataset‑level metadata metrics: always begin with high‑level monitors to verify if the right amount of data arrived on time and in the correct format. These require no configuration. They just need to be enabled
Secondly, RAD: apply Record-level Anomaly Detection to validate the actual content of the data. This step also requires no configuration (only enablement) and provides broad coverage across all columns and segments.
Next, column monitors: apply column‑level monitoring for specific use cases where the potential data quality issue and metric are known but expected to change over time. These should be minimized as they are prone to generating false alerts.
Lastly, checks: use checks for critical tables where expectations are clearly defined. For example

Data quality tool

Column

Metric

Failure

Configuration

Checks

Known

Missing values in Amount < 5%

Column monitors

Known

Unknown

Anomaly detection on Amount for missing values

RAD monitor

Unknown

RAD on all columns

RAD requirements

For a dataset to be monitored by RAD, the following conditions must be met:

Time partition column: the dataset must include a column that partitions data by time (for example, created_at).
Primary key: the dataset must have a primary key to uniquely identify rows.
Diagnostics Warehouse setup: a Diagnostics Warehouse must be configured to store the daily sample, consisting of either primary keys or, ideally, a full copy of the sampled rows.

Next: to enable Record-level Anomaly Detection in your organization, reach out to Soda at [email protected].

You are not logged in to Soda and are viewing the default public documentation. Learn more about Documentation access & licensing.

PreviousProfiling NextScan time strategy

Last updated 2 months ago

Was this helpful?