Reconciliation checks
This page documents the contract language syntax for reconciliation.
Available on the 15th of September 2025.
Reconciliation checks validate that a target dataset matches a source dataset, ensuring that data remains consistent after migrations, in pipelines, or during synchronizations. They can be used for both metric-level (aggregate) validation and row-level (record-by-record) validation.
Learn more about the use cases and performance considerations: Data reconciliation
Prerequisites
To use reconciliation checks, you must either:
Run your contract with Soda Agent, which has reconciliation support built in, or
Install the Soda Reconciliation extension locally:
pip install -i https://pypi.dev.sodadata.io/simple -U soda-reconciliation
Example
Structure
dataset:
defines the target dataset.reconciliation.source.dataset:
defines the source dataset to compare against.reconciliation.source.filter:
is an optional filter applied only to the source dataset.Each reconciliation
check:
supports an additional check-level filter applied consistently to both source and target, layered on top of dataset-level filters.Thresholds define acceptable differences.
All Common Check Configurations (filters, thresholds, names, qualifiers, attributes) apply to reconciliation checks. Common check configurations
dataset: cloud_data_source/db/schema/dataset
filter: created_at >= CURRENT_DATE - INTERVAL '1 day'
reconciliation:
source:
dataset: on_prem_data_source/db/schema/dataset
filter: created_at >= CURRENT_DATE - INTERVAL '1 day'
checks:
- row_count_diff:
threshold:
must_be_less_than: 1
- aggregate_diff:
function: avg
column: employee_key
filter: employee_key < 100
- duplicate_diff:
columns: [employee_key]
threshold:
must_be_less_than: 1
metric: percent
- freshness_diff:
column: hire_date
threshold:
must_be: 0
unit: hour
- metric_diff:
source_expression: SUM(employee_key + parent_employee_key)
target_expression: SUM(employee_key + parent_employee_key)
threshold:
must_be_less_than: 100
metric: percent
- rows_diff:
source_key_columns: [employee_key]
target_key_columns: [employee_key]
source_columns: [price, order_date]
target_columns: [price, order_date]
threshold:
must_be: 0
Row count diff
Compares the row count of the source and target datasets.
Example
reconciliation:
source:
dataset: contracts-source/postgres/public/dim_employee_copy
checks:
- row_count_diff:
threshold:
must_be_less_than: 1
Configuration keys
Aggregate diff
Compares the result of an aggregate function on a column between source and target.
Example
reconciliation:
source:
dataset: contracts-source/postgres/public/dim_employee_copy
checks:
- aggregate_diff:
function: avg
column: employee_key
filter: employee_key < 100
threshold:
must_be_less_than: 0.5
Configuration keys
function
Aggregate function (avg
, sum
, min
, max
, avg_length
, etc.)
No
column
Column to aggregate
Yes
Duplicate diff
Compares the number or percentage of duplicate rows based on one or more columns.
Example
reconciliation:
source:
dataset: contracts-source/postgres/public/dim_employee_copy
checks:
- duplicate_diff:
columns: [employee_key]
threshold:
must_be_less_than: 1
metric: percent
Configuration keys
columns
List of column(s) to evaluate duplicates on
No
filter
Reconciliation checks. Filter applied to both source and target.
Support both comparison of metric:percent
and metric:count
Yes
Freshness diff
Compares freshness (recency of the latest timestamp) between source and target.
Example
reconciliation:
source:
dataset: contracts-source/postgres/public/dim_employee_copy
checks:
- freshness_diff:
column: hire_date
threshold:
must_be_less_than: 1
unit: hour
Configuration keys
column
Timestamp column used to measure freshness
Yes
unit
Unit of time (hour
, minute
, day
)
Yes
Metric diff
Compares results of custom SQL expressions or queries across source and target.
Example
reconciliation:
source:
dataset: contracts-source/postgres/public/dim_employee_copy
checks:
- metric_diff:
source_expression: SUM(employee_key + parent_employee_key)
target_expression: SUM(employee_key + parent_employee_key)
threshold:
must_be_less_than: 100
Configuration keys
source_expression
SQL expression for source
No*
target_expression
SQL expression for target
No*
source_query
Full SQL query for source metric
No*
target_query
Full SQL query for target metric
No*
* Either expression or query must be defined.
Rows diff
Compares rows between source and target based on keys, and checks specified columns for differences.
Example
reconciliation:
source:
dataset: contracts-source/postgres/public/dim_employee_copy
checks:
- rows_diff:
source_key_columns: [employee_key]
target_key_columns: [employee_key]
source_columns: [price, order_date]
target_columns: [price, order_date]
threshold:
must_be: 0
metric: percent
Configuration keys
source_key_columns
Key column(s) to align rows in source dataset
No
target_key_columns
Key column(s) to align rows in target dataset
No
source_columns
Columns to compare in the target dataset. If omitted, all columns are compared based on column order. The number of defined source columns must match the number of defined target columns
Yes
target_columns
Columns to compare in the target dataset. If omitted, all columns are compared based on column order. The number of defined target columns must match the number of defined source columns.
Yes
threshold
Acceptable difference between source and target.
Thresholds can be defined in two ways:
As the count of differing rows between source and target.
As the percentage of differing rows, relative to the number of tested rows in the source dataset.
By default, threshold = 0
Yes
Last updated
Was this helpful?