Link Search Menu Expand Document

For each

Last modified on 27-Sep-23

Use a for each configuration to execute checks against multiple datasets during a scan.

for each dataset T:
  datasets:
    - dim_products%
    - fact%
    - exclude fact_survey_response
  checks:
    - row_count > 0

Define a for each configuration
Limitations and specifics
Optional check configurations
For each results in Soda Cloud
Go further

Define a for each configuration

Add a for each section to your checks YAML file to specify a list of checks you wish to execute on multiple datasets.

  1. Add a for each dataset T section header anywhere in your YAML file. The purpose of the T is only to ensure that every for each configuration has a unique name.
  2. Nested under the section header, add two nested keys, one for datasets and one for checks.
  3. Nested under datasets, add a list of datasets against which to run the checks. Refer to the example below that illustrates how to use include and exclude configurations and wildcard characters (%) .
  4. Nested under checks, write the checks you wish to execute against all the datasets listed under datasets.
for each dataset T:
  datasets:
    # include the dataset 
    - dim_customers
    # include all datasets matching the wildcard expression
    - dim_products%
    # (optional) explicitly add the word include to make the list more readable
    - include dim_employee
    # exclude a specific dataset
    - exclude fact_survey_response
    # exclude any datasets matching the wildcard expression
    - exclude prospective_%
  checks:
    - row_count > 0

Limitations and specifics

  • For each is not compatible with dataset filters.
  • Soda Library dataset names matching is case insensitive.
  • You cannot use quotes around dataset names in a for each configuration.
  • If any of your checks specify column names as arguments, make sure the column exists in all datasets listed under the datasets heading.
  • To add multiple for each configurations in your checks YAML file, configure another for each section header with a different letter identifier, such as for each dataset R.

Optional check configurations

Supported Configuration Documentation
Define a name for a for each check; see example. Customize check names
Add an identity to a check. Add a check identity
Define alert configurations to specify warn and fail alert conditions; see example. Add alert configurations.
Apply an in-check filter to return results for a specific portion of the data in your dataset; see example. Add an in-check filter.
  Use quotes when identifying dataset or column names. -
Use wildcard characters ( % ) in values in the for each configuration; see example. -
  Apply a dataset filter to partition data during a scan. -

Example with check name

for each dataset T:
  datasets:
    - dim_employee

  checks:
    - max(vacation_hours) < 80:
        name: Too many vacation hours for US Sales

Example with alert configuration

for each dataset T:
  datasets:
    - dim_employee
    - dim_customer

  checks:
    - row_count:
        fail:
          when < 5
        warn:
          when > 10

Example with in-check filter

for each dataset T:
  datasets:
    - dim_employee

  checks:
    - max(vacation_hours) < 80:
        filter: sales_territory_key = 11

Example with wildcard

for each dataset T:
  datasets:
    - dim_%

  checks:
    - row_count > 1

For each results in Soda Cloud

Soda Library pushes the check results for each dataset to Soda Cloud where each check appears in the Checks dashboard, with an icon indicating their latest scan result. Filter the results by dataset to review dataset-specific results.

for each dataset T:
  datasets:
    - dim_employee
    - dim_customer

  checks:
    - row_count > 1

foreach-cloud

Go further


Was this documentation helpful?

What could we do to improve this page?

Documentation always applies to the latest version of Soda products
Last modified on 27-Sep-23