Active checks and datasets
Last modified on 09-Jan-25
Soda’s licensing model can include volume-based measures of active checks, or a similar model based on active datasets.
An active dataset is one for which Soda has executed at least one check, excluding empty datasets. A dataset counts as active if you add configuration for:
- a check
- automated monitoring checks; see Add automated monitoring checks
- an anomaly detection dashboard (available in 2025); see Activate anomaly dashboards
An active check is one that Soda has executed during a scan at least once in the past 90 days. A single check, whether it has been executed during one scan, fifty scans, or five hundred scans in the last 90 days counts as one active check.
A single check is identifiable as a test that yields a single result.
A check with one or more alert configurations counts as a single check. The following is an example of a single active check as it only ever yields one result: pass, warn, fail, or error. Note, A check that results in an error counts as an active check. Soda executes the check during a scan in order to yield a result; if the result is an error, it is still a result.
checks for dim_reseller:
- duplicate_count(phone):
warn: when between 1 and 10
fail: when > 10
A check that is included as part of a for each configuration yields a single result for each dataset against which it is executed. The following example produces four check results and, thus, has four active checks and four active datasets.
for each dataset T:
datasets:
- dim_employee
- dim_customer
- dim_product
- dim_reseller
checks:
- row_count:
fail:
when < 5
warn:
when > 10
Similarly, a single check that is included in a scan against two data sources, or two environments such as staging and production, counts as two active checks for two active datasets. The following example checks.yml
file contains as a single check. The scan commands that follow instruct Soda to execute the check on two different environments which counts as two active checks for two active datasets. See also: Configure the same scan in multiple environments.
checks for dim_customer:
- row_count > 0
soda scan -d snowflake_prod -c configuration.yml -s prod_run checks.yml
soda scan -d snowflake_staging -c configuration.yml -s stage_run checks.yml
A check that involves data comparison between multiple datasets in the same, or different, data sources counts as a single check. The following example has four checks, two cross checks and two reference checks, and counts as two active datasets.
checks for dim_customer:
- row_count same as dim_department_group
- row_count same as retail_customers in aws_postgres_retail
checks for dim_department_group:
- values in (department_group_name) must exist in dim_employee (department_name)
- values in (birthdate) must not exist in dim_department_group_prod (birthdate)
Similarly, a reconciliation check that compares data between source and target datasets in the same, or different, data sources counts as a single active check running against a single active dataset which, for this type of check, is the target dataset. The following example has five active checks for five active datasets.
reconciliation Production:
datasets:
source:
dataset: dim_customer
datasource: mysql_adventureworks
target:
dataset: dim_customer
datasource: snowflake_retail
checks:
- row_count diff = 0
- duplicate_count(last_name):
fail: when diff > 10%
warn: when diff is between 5% and 9%
- avg(total_children) diff < 10
- rows diff < 5
- schema:
types:
- source: bit
target: boolean
- source: enum
target: string
Where a check involves grouping its results by category, as in a Group By configuration, the check itself still counts as a single check. The following example has one active check for one active dataset.
checks for fact_internet_sales:
- group by:
group_limit: 10
query: |
SELECT sales_territory_key, AVG(discount_amount) as average_discount
FROM fact_internet_sales
GROUP BY sales_territory_key
fields:
- sales_territory_key
checks:
- average_discount:
fail: when > 40
name: Average discount percentage is less than 40% (grouped-by sales territory)
Go further
- Access information about optional configurations that you can use in SodaCL checks.
- Need help? Join the Soda community on Slack.
Was this documentation helpful?
What could we do to improve this page?
- Suggest a docs change in GitHub.
- Share feedback in the Soda community on Slack.
Documentation always applies to the latest version of Soda products
Last modified on 09-Jan-25