Link Search Menu Expand Document

Reference checks

Use a reference check to validate that column contents match between datasets in the same data source.

checks for dim_department_group:
  - values in (department_group_name) must exist in dim_employee (department_name)

Define reference checks
Optional check configurations
Go further

Define reference checks

In the context of SodaCL check types, reference checks are unique. This check is limited in its syntax variation, with only a few mutable parts to specify column and dataset names.

The example below checks that the values in the source column, department_group_name, in the dim_department_group dataset exist in the destination column, department_name, in the dim_employee dataset. If the values are absent in the department_name column, the check fails.

  • Soda CL considers missing values in the source column as invalid.
  • Optionally, do not use brackets around column names. The brackets serve as visual aids to improve readability.
checks for dim_department_group:
  - values in (department_group_name) must exist in dim_employee (department_name)

You can use reference checks to compare the values of multiple columns in different datasets, as in the following example. Soda compares the columns in the order you list them, so in the example below, last_name compares to last_name, and first_name compares to first_name.

checks for dim_customers_dev:
  - values in (last_name, first_name) must exist in dim_customers_prod (last_name, first_name)

Optional check configurations

Configuration Documentation
Define a name for a schema check; see example. Customize check names
  Define alert configurations to specify warn and fail alert conditions. -
  Apply a filter to return results for a specific portion of the data in your dataset. -
Use quotes when identifying dataset or column names; see example Use quotes in a check
  Use wildcard characters ( % or * ) in values in the check. -
  Use for each to apply schema checks to multiple datasets in one scan. -
Apply a dataset filter to partition data during a scan; see example. Scan a portion of your dataset

Example with check name

checks for dim_department_group:
  - values in (department_group_name) must exist in dim_employee (department_name):
      name: Compare department datasets

Example with quotes

checks for dim_department_group:
  - values in ("department_group_name") must exist in dim_employee ("department_name")

Example with dataset filter

coming soon


Go further


Last modified on 01-Jul-22

Was this documentation helpful?
Share feedback in the Soda community on Slack.

Help improve our docs!

  • Request a docs change.
  • Edit this page in our GitHub repo.