Link Search Menu Expand Document

For each

Last modified on 26-Apr-24

Use a for each configuration to execute checks against multiple datasets during a scan.

for each dataset T:
  datasets:
    - dim_products%
    - fact%
    - exclude fact_survey_response
  checks:
    - row_count > 0

✖️    Requires Soda Core Scientific (included in a Soda Agent)
✔️    Supported in Soda Core
✔️    Supported in Soda Library + Soda Cloud
✔️    Supported in Soda Cloud Agreements + Soda Agent
✖️    Supported by SodaGPT
✖️    Available as a no-code check

Define a for each configuration
Limitations and specifics
Optional check configurations
    Add a dynamic name to for each checks
For each results in Soda Cloud
Go further

Define a for each configuration

Add a for each section to your checks configuration to specify a list of checks you wish to execute on multiple datasets.

  1. Add a for each dataset T section header anywhere in your YAML file. The purpose of the T is only to ensure that every for each configuration has a unique name.
  2. Nested under the section header, add two nested keys, one for datasets and one for checks.
  3. Nested under datasets, add a list of datasets against which to run the checks. Refer to the example below that illustrates how to use include and exclude configurations and wildcard characters (%) .
  4. Nested under checks, write the checks you wish to execute against all the datasets listed under datasets.
for each dataset T:
  datasets:
    # include the dataset 
    - dim_customers
    # include all datasets matching the wildcard expression
    - dim_products%
    # (optional) explicitly add the word include to make the list more readable
    - include dim_employee
    # exclude a specific dataset
    - exclude fact_survey_response
    # exclude any datasets matching the wildcard expression
    - exclude prospective_%
  checks:
    - row_count > 0

Limitations and specifics for for each

  • For each is not compatible with dataset filters.
  • Soda dataset names matching is case insensitive.
  • You cannot use quotes around dataset names in a for each configuration.
  • If any of your checks specify column names as arguments, make sure the column exists in all datasets listed under the datasets heading.
  • To add multiple for each configurations, configure another for each section header with a different letter identifier, such as for each dataset R.

Optional check configurations

Supported Configuration Documentation
Define a name for a for each check; see example. Customize check names
Add an identity to a check. Add a check identity
Define alert configurations to specify warn and fail alert conditions; see example. Add alert configurations.
Apply an in-check filter to return results for a specific portion of the data in your dataset; see example. Add an in-check filter.
  Use quotes when identifying dataset or column names. -
Use wildcard characters ( % ) in values in the for each configuration; see example. -
  Apply a dataset filter to partition data during a scan. -

Example with check name

for each dataset T:
  datasets:
    - dim_employee

  checks:
    - max(vacation_hours) < 80:
        name: Too many vacation hours for US Sales

Example with alert configuration

for each dataset T:
  datasets:
    - dim_employee
    - dim_customer

  checks:
    - row_count:
        fail:
          when < 5
        warn:
          when > 10

Example with in-check filter

for each dataset T:
  datasets:
    - dim_employee

  checks:
    - max(vacation_hours) < 80:
        filter: sales_territory_key = 11

Example with wildcard

for each dataset T:
  datasets:
    - dim_%

  checks:
    - row_count > 1

Add a dynamic name to for each checks

To keep your for each check results organized in Soda Cloud, you may wish to dynamically add a name to each check so that you can easily identify to which dataset the check result applies.

For example, if you use for each to execute an anomaly detection check on many datasets, you can use a variable in the syntax of the check name so that Soda dynamically adds a dataset name to each check result.

for each dataset R:
  datasets:
    - retail%
  checks:
    - anomaly detection for row_count:
        name: Row count anomaly for ${R}

For each results in Soda Cloud

Soda pushes the check results for each dataset to Soda Cloud where each check appears in the Checks dashboard, with an icon indicating their latest scan result. Filter the results by dataset to review dataset-specific results.

for each dataset T:
  datasets:
    - dim_employee
    - dim_customer

  checks:
    - row_count > 1

foreach-cloud

Go further


Was this documentation helpful?

What could we do to improve this page?

Documentation always applies to the latest version of Soda products
Last modified on 26-Apr-24