Link Search Menu Expand Document

Add automated monitoring checks

Last modified on 23-Feb-24

Use automated monitoring checks to instruct Soda to automatically check for row count anomalies and schema changes in a dataset.
Requires Soda Agent

automated monitoring:
  datasets:
    - include %
    - exclude test%

About automated monitoring checks
Add automated monitoring checks
Add quotes to all datasets
Go further

About automated monitoring checks

When you add automated monitoring checks to your data source in Soda Cloud, Soda prepares and executes two checks on all the datasets you indicated as included in the configuration.

Anomaly score check on row count: This check counts the number of rows in a dataset during scan and registers anomalous counts relative to previous measurements for the row count metric. Refer to Anomaly score checks for details.
Anomaly score checks require a minimum of four data points (four scans at stable intervals) to establish a baseline against which to gauge anomalies. If you do not see check results immediately, allow Soda Library to accumulate the necessary data points for relative comparison.

Schema evolution check: This check monitors schema changes in datasets, including column addition, deletion, data type changes, and index changes. By default, this automated check results in a failure if a column is deleted, its type changes, or its index changes; it results in a warning if a column is added. Refer to Schema checks for details.
Schema checks require a minimum of one data point to use as a baseline against which to gauge schema changes. If you do not see check results immediately, wait until after you have scanned the dataset twice.

Add automated monitoring checks

You add automated monitoring checks as part of the guided workflow to create a new data source. Navigate to your avatar > Data Sources > New Data Source to begin.

In step 4 of the guided workflow, you have the option of listing the datasets to which you wish to automatically add anomaly score and schema evolution checks. The example check below uses a wildcard character (%) to specify that Soda Library executes automated monitoring checks against all datasets with names that begin with prod, and not to execute the checks against any dataset with a name that begins with test.

automated monitoring:
  datasets:
    - include prod%
    - exclude test%


You can also specify individual datasets to include or exclude, as in the following example.

automated monitoring:
  datasets:
    - include orders

Scan results in Soda Cloud

To review the check results for automated monitoring checks in Soda Cloud, you can:

  • navigate to the Checks dashboard to see the check results
  • navigate to the Datasets dashboard to find the check results


Add quotes to all datasets

If your dataset names include white spaces or use special characters, you must wrap those dataset names in quotes whenever you identify them to Soda, such as in a checks YAML file.

To add those necessary quotes to dataset names that Soda acts upon automatically – discovering, profiling, or sampling datasets, or creating automated monitoring checks – you can add a quote_tables configuration to your data source, as in the following example.

data_source soda_demo:
  type: sqlserver
  host: localhost
  username: ${SQL_USERNAME}
  password: ${SQL_PASSWORD}
  quote_tables: true


Go further


Was this documentation helpful?

What could we do to improve this page?

Documentation always applies to the latest version of Soda products
Last modified on 23-Feb-24