Cross checks

Use a SodaCL cross check to compare row counts across datasets in the same, or different, data sources.

Use a cross check to compare row counts between datasets within the same, or different, data sources.

Define cross checks

In the context of SodaCL check types, cross checks are unique. This check employs the row_count metric and is limited in its syntax variation, with only a few mutable parts to specify dataset and data source names.

The example check below compares the volume of rows in two datasets in the same data source. If the row count in the dim_department_group is not the same as in dim_customer, the check fails.

checks for dim_customer:
  - row_count same as dim_department_group

You can use cross checks to compare row counts between datasets in different data sources, as in the example below.

In the example, retail_customers is the name of the other dataset, and aws_postgres_retail is the name of the data source in which retail_customers exists.

checks for dim_customer:
  - row_count same as retail_customers in aws_postgres_retail

If you wish to compare row counts of datasets in different data sources, you must have configured a connection to both data sources. Soda needs access to both data sources in order to execute a cross check between data sources.
The data sources do not need to be the same type; you can compare a dataset in a PostgreSQL data source to a dataset in a BigQuery data source.

Optional check configurations

Supported

Configuration

Documentation

✓

Define a name for a cross check; see example.

Customize check names

✓

Add an identity to a check.

Add a check identity

Define alert configurations to specify warn and fail alert conditions.

Apply an in-check filter to return results for a specific portion of the data in your dataset.

✓

Use quotes when identifying dataset or column names; see example. Note that the type of quotes you use must match that which your data source uses. For example, BigQuery uses a backtick (`) as a quotation mark.

Use quotes in a check

Use wildcard characters ( % or * ) in values in the check.

Use for each to apply schema checks to multiple datasets in one scan.

✓

Apply a dataset filter to partition data during a scan; see example.

Example with check name

checks for dim_customer:
  - row_count same as retail_customers in aws_postgres_retail:
      name: Cross check customer datasets

Example with quotes

checks for dim_customer:
  - row_count same as "dim_department_group"

Example with dataset filters

filter dim_promotion [daily]:
  where: discount_pct = '0.5'

filter retail_orders [daily]:
  where: discount = `50'

checks for dim_promotion [daily]:
  - row_count same as retail_orders [daily] in aws_postgres_retail:
      name: Cross check between data sources

Go further

Learn more about SodaCL metrics and checks in general.
Learn more about Comparing data using SodaCL.
Use a schema check to discover missing or forbidden columns in a dataset.
Reference tips and best practices for SodaCL.

Need help? Join the Soda community on Slack.

PreviousCheck template NextDistribution checks

Last updated 1 month ago

Was this helpful?