Active checks and datasets

Learn more about active checks and datasets as they are defined in Soda's licensing model.

Soda’s licensing model can include volume-based measures of active checks, or a similar model based on active datasets.

An active dataset is one for which Soda has executed at least one check, excluding empty datasets. A dataset counts as active if you add configuration for:

An active check is one that Soda has executed during a scan at least once in the past 90 days. A single check, whether it has been executed during one scan, fifty scans, or five hundred scans in the last 90 days counts as one active check.

A single check is identifiable as a test that yields a single result.

A check with one or more alert configurations counts as a single check. The following is an example of a single active check as it only ever yields one result: pass, warn, fail, or error. Note, A check that results in an error counts as an active check. Soda executes the check during a scan in order to yield a result; if the result is an error, it is still a result.

checks for dim_reseller:
  - duplicate_count(phone):
      warn: when between 1 and 10
      fail: when > 10

A check that is included as part of a for each configuration yields a single result for each dataset against which it is executed. The following example produces four check results and, thus, has four active checks and four active datasets.

for each dataset T:
  datasets:
    - dim_employee
    - dim_customer
    - dim_product
    - dim_reseller
  checks:
    - row_count:
        fail:
          when < 5
        warn:
          when > 10

Similarly, a single check that is included in a scan against two data sources, or two environments such as staging and production, counts as two active checks for two active datasets. The following example checks.yml file contains as a single check. The scan commands that follow instruct Soda to execute the check on two different environments which counts as two active checks for two active datasets. See also: Configure the same scan in multiple environments.

A check that involves data comparison between multiple datasets in the same, or different, data sources counts as a single check. The following example has four checks, two cross checks and two reference checks, and counts as two active datasets.

Similarly, a reconciliation check that compares data between source and target datasets in the same, or different, data sources counts as a single active check running against a single active dataset which, for this type of check, is the target dataset. The following example has five active checks for five active datasets.

Where a check involves grouping its results by category, as in a group by configuration, the check itself still counts as a single check. The following example has one active check for one active dataset.

Go further

Need help? Join the Soda community on Slack.

Last updated

Was this helpful?