Reconciliation checks

Yes

threshold

Acceptable difference between source and target. By default, threshold = 0

Yes

filter

Reconciliation checks. Filter applied to both source and target.

Yes

qualifier

Yes

attributes

Yes

Aggregate diff

Compares the result of an aggregate function on a column between source and target.

Example

reconciliation:
  source:
    dataset: contracts-source/postgres/public/dim_employee_copy
  checks:
    - aggregate_diff:
        function: avg
        column: employee_key
        filter: employee_key < 100
        threshold:
          must_be_less_than: 0.5

Configuration keys

Key

Description

Optional

function

Aggregate function (avg, sum, min, max, avg_length, etc.)

column

Column to aggregate

Yes

threshold

Acceptable difference between source and target. By default, threshold = 0

Yes

filter

Reconciliation checks. Filter applied to both source and target

Yes

name

Yes

qualifier

Yes

attributes

Yes

Duplicate diff

Compares the number or percentage of duplicate rows based on one or more columns.

Example

reconciliation:
  source:
    dataset: contracts-source/postgres/public/dim_employee_copy
  checks:
    - duplicate_diff:
        columns: [employee_key]
        threshold:
          must_be_less_than: 1
          metric: percent

Configuration keys

Key

Description

Optional

columns

List of column(s) to evaluate duplicates on

threshold

Acceptable difference between source and target. By default, threshold = 0

Yes

filter

Reconciliation checks. Filter applied to both source and target. Support both comparison of metric:percent and metric:count

Yes

name

Yes

qualifier

Yes

attributes

Yes

Freshness diff

Compares freshness (recency of the latest timestamp) between source and target.

Example

reconciliation:
  source:
    dataset: contracts-source/postgres/public/dim_employee_copy
  checks:
    - freshness_diff:
        column: hire_date
        threshold:
          must_be_less_than: 1
          unit: hour

Configuration keys

Key

Description

Optional

column

Timestamp column used to measure freshness

Yes

unit

Unit of time (hour, minute, day)

Yes

threshold

Acceptable difference between source and target. By default, threshold = 0

Yes

filter

Reconciliation checks. Filter applied to both source and target

Yes

name

Yes

qualifier

Yes

attributes

Yes

Metric diff

Compares results of custom SQL expressions or queries across source and target.

Example

reconciliation:
  source:
    dataset: contracts-source/postgres/public/dim_employee_copy
  checks:
    - metric_diff:
        source_expression: SUM(employee_key + parent_employee_key)
        target_expression: SUM(employee_key + parent_employee_key)
        threshold:
          must_be_less_than: 100

Configuration keys

Key

Description

Optional

source_expression

SQL expression for source

No*

target_expression

SQL expression for target

No*

source_query

Full SQL query for source metric

No*

target_query

Full SQL query for target metric

No*

threshold

Acceptable difference between source and target. By default, threshold = 0

Yes

filter

Reconciliation checks. Filter applied to both source and target

Yes

name

Yes

qualifier

Yes

attributes

Yes

* Either expression or query must be defined.

Rows diff

Compares rows between source and target based on keys, and checks specified columns for differences.

Example

reconciliation:
  source:
    dataset: contracts-source/postgres/public/dim_employee_copy
  checks:
    - rows_diff:
        source_key_columns: [employee_key]
        target_key_columns: [employee_key]
        source_columns: [price, order_date]
        target_columns: [price, order_date]
        threshold:
          must_be: 0
          metric: percent

Configuration keys

Key

Description

Optional

source_key_columns

Key column(s) to align rows in source dataset

target_key_columns

Key column(s) to align rows in target dataset

source_columns

Columns to compare in the target dataset. If omitted, all columns are compared based on column order. The number of defined source columns must match the number of defined target columns

Yes

target_columns

Columns to compare in the target dataset. If omitted, all columns are compared based on column order. The number of defined target columns must match the number of defined source columns.

Yes

threshold

Acceptable difference between source and target.

Thresholds can be defined in two ways:

As the count of differing rows between source and target.
As the percentage of differing rows, relative to the number of tested rows in the source dataset.

By default, threshold = 0

Yes

filter

Reconciliation checks. Filter applied to both source and target

Yes

name

Yes

qualifier

Yes

attributes