Link Search Menu Expand Document

Freshness checks

Use a freshness check to determine the relative age of the data in a column in your dataset.

checks for dim_product:
  - freshness(start_date) < 3d

Define freshness checks
Freshness check results
Optional check configurations
List of freshness thresholds
List of comparison symbols and phrases
Go further

Define freshness checks

A freshness check measures the age of the youngest row in a table. Using this check, you can infer the freshness of the data in the dataset using the age of the most recently added row in a table.

In the context of SodaCL check types, freshness checks are unique. This check is limited in its syntax variation, with only a few mutable parts to specify column name, threshold, and, optionally, a “now” variable.

A freshness check has two or three mutable parts:

a timestamp column name
a variable to specify the value of “now” (optional)
a threshold

The example below defines a check that measures freshness relative to “now”, where “now” is the moment you run the scan that executes the freshness check. This example discovers when the last row was added to the start_date timestamp column, then compares that timestamp to “now”. If Soda discovers that the last row was added more than three days ago, the check fails.

  • Freshness checks only work with columns that contain timestamp values.
  • The only comparison symbol you can use with freshness checks is < except when you employ and alert configuration. See Example with alert configuration for details.
  • The default value for “now” is the time you run the scan that executes the freshness check.
  • If no timezone information is available in either the timestamp of the check (scan time), or in the data in the column, a freshness check uses the UTC timezone. Soda converts both timestamps to UTC to compare values.

checks for dim_product:
  - freshness(start_date) < 3d
timestamp column name start_date
threshold 3d

Instead of using the default value for “now” (the time you run the scan that executes the freshness check), you can use a variable to specify the value of “now” at scan time. For example, the following check measures freshness relative to a date that a user specifies at scan time.

checks for dim_product:
  - freshness(end_date, CUST_VAR) < 1d
timestamp column name end_date
variable to specify the value of “now” (optional) CUST_VAR
threshold 1d

At scan time, you use a -v option to pass a value for the variable that the check expects for the value of “now”. The scan command below passes a variable that the check uses.

soda scan -d adventureworks -c configuration.yml -v CUST_VAR=2022-05-31 checks_test.yml

Troubleshoot errors with freshness checks

Problem: When you run a scan to execute a freshness check, the CLI returns one of the following error message when you run a scan.

Invalid staleness threshold "when < 3256d"
  +-> line=2,col=5 in checks_test.yml

Invalid check "freshness(start_date) > 1d": no viable alternative at input ' >'

Solution: The error indicates that you are using an incorrect comparison symbol. Remember that freshness checks can only use < in check, unless the freshness check employs an alert configuration, in which case it can only use > in the check.

Freshness check results

When you run a scan that includes a freshness check, the output in the Soda Core CLI provides several values for measurements Soda used to calculate freshness. The value for freshness itself is displayed in days, hours, minutes, seconds, and milliseconds; see the example below.

In Soda Cloud, the freshness value represents age of the data in the days, hours, minutes, etc. relative to now_timestamp.

Soda Core 3.0.x
Scan summary:
1/1 checks FAILED: 
      Data is fresh [FAILED]
        max_column_timestamp: 2013-07-01 00:00:00
        max_column_timestamp_utc: 2013-07-01 00:00:00+00:00
        now_variable_name: NOW
        now_timestamp: 2022-09-13T16:40:39.196522+00:00
        now_timestamp_utc: 2022-09-13 16:40:39.196522+00:00
        freshness: 3361 days, 16:40:39.196522
Oops! 1 failures. 0 warnings. 0 errors. 0 pass.

Optional check configurations

Supported Configuration Documentation
Define a name for a freshness check; see example. Customize check names
Define alert configurations to specify warn and fail thresholds; see example. Add alert configurations
  Apply an in-check filter to return results for a specific portion of the data in your dataset. -
Use quotes when identifying dataset or column names; see example Use quotes in a check
  Use wildcard characters ( % or * ) in values in the check. -
Use for each to apply freshness checks to multiple datasets in one scan; see example. Apply checks to multiple datasets
Apply a dataset filter to partition data during a scan; see example. Scan a portion of your dataset

Example with check name

checks for dim_product:
  - freshness(start_date) < 27h:
      name: Data is fresh

Example with alert configuration

The only comparison symbol that you can use with freshness checks that employ an alert configuration is >.

checks for dim_product:
  - freshness(start_date):
      warn: when > 3256d
      fail: when > 3258d


checks for dim_product:
  - freshness(start_date):
        when > 3256d
        when > 3258d

Example with quotes

checks for dim_product:
  - freshness("end_date") < 3d

Example with for each

for each dataset T:
    - dim_prod%
    - freshness(end_date) < 3d

Example with dataset filter

coming soon

List of freshness thresholds

Threshold Example Reads as
#d 3d 3 days
#h 1h 1 hour
#m 30m 30 minutes
#d#h 1d6h 1 day and 6 hours
#h#m 1h30m 1 hour and 30 minutes

List of comparison symbols and phrases

# If using without an alert configuration
# If using with an alert configuration

Go further

Was this documentation helpful?

What could we do to improve this page?

Last modified on 30-Sep-22