Link Search Menu Expand Document

Detect anomalies

In Soda Cloud, you can create a monitor that automatically detects anomalies in your time-series data.

Anomaly detection is a monitor Evaluation Type powered by a machine learning algorithm that works with measurements that occur over time. The algorithm learns the patterns of your data – its trends and seasonality – to identify and flag anomalous measurements in time-series data. (“Seasonality” is a common pattern of time-series data and translates as “something cyclical irrespecitve of the general direction of the data. For example, the number of orders that occur on your platform might show a clear drop during the weekend, or peak during the holiday season, irrespecitve of whether your platform is growing or not).

anomaly-detection

The algorithm automatically adapts to your data. It learns how your data generally behaves over time to build a model of data patterns, then compares actual measurements to that model. If the two disagree, the algorithm identifies the actual measurement as an anomaly and calculates:

a. the certainty of the algorithm’s mental model, or how well it knows your data, and
b. the size of the anomaly, or how far the measurement deviates from the expected value.

Soda Cloud uses the certainty and size calculations to derive the thresholds for triggering alerts:

  • a large, certain anomaly triggers a critical alert
  • a small, less certain anomaly triggers a warning

As long as the test that a Soda scan executes results in a numerical measurement that regularly changes over time, you can use anomaly detection in a monitor.

For example, if the test you define in your monitor measures the row count of one of your datasets every hour as part of your transformation pipeline, you can use anomaly detection to discover unexpected volumes of entries in the dataset. If the test you define measures the price of an asset at the end of each day, you can use anomaly detection to get an alert if the price jumps wildly high or unexpectedly low.

Use anomaly detection

To use anomaly detection, follow the steps to create a new monitor and select Anomaly Detection as the Evaluation Type in step two of the create flow. Beyond that, you do not need to specify any other details about your data; Soda Cloud automatically begins learning about your data and triggering alerts when Soda scans reveal anomalies.

When you access the Monitors dashboard to review the monitor’s test results, the chart that displays the time-series data and its anomalies gives you the opportunity to manually provide feedback on an anomalous measurement. Your feedback on the accuracy of the calculated anomaly helps the algorithm to adapt its recognition of exceptions and deviations, and refine its thresholds for triggering alerts.

Go further



Last modified on 16-Jul-21

Was this documentation helpful?
Give us your feedback in the #soda-docs channel in the Soda community on Slack or open an issue in GitHub.