Link Search Menu Expand Document

User-defined checks

Last modified on 26-Jan-23

If the built-in set of metrics and checks that SodaCL offers do not quite give you the information you need from a scan, you can define your own metrics to customize your checks. User-defined checks essentially enable you to create common-table expressions or SQL queries that Soda Core runs during a scan.

Define user-defined checks
Optional check configurations
List of comparison symbols and phrases
Go further

Define user-defined checks

In the context of SodaCL check types, these are user-defined checks. Truly, it is the metric that you define yourself, then use in a check.

The example below uses common table expression (CTE) to define the metric that is then used in the check. The check itself follows the simple pattern of a standard check that uses a metric, a comparison symbol or phrase, and a threshold. You specify the CTE value for the custom metric using a nested expression key which also defines the name of the new custom metric.

  • The name you provide for a custom metric must not contain spaces.
checks for dim_reseller:
  - avg_order_span between 5 and 10:
      avg_order_span expression: AVG(last_order_year - first_order_year)
custom metric avg_order_span
comparison symbol or phrase between
threshold 5 and 10
expression key avg_order_span expression
expression value AVG(last_order_year - first_order_year)


Instead of using CTE to define a custom metric, you can use a SQL query. The example check below follows the same standard check pattern, but includes a nested query key to define the custom metric and its name.

  • The name you provide for a custom metric must not contain spaces.
  • Though you specify the dataset against which to run the query in the SQL query, you must also provide the dataset identifier in the checks for section header. Without the dataset identifier, Soda Core cannot send the check results to Soda Cloud.
checks for dim_product:
  - product_stock >= 50:
      product_stock query: |
        SELECT COUNT(safety_stock_level - days_to_manufacture)
        FROM dim_product
custom metric product_stock
comparison symbol or phrase >=
threshold 50
query key product_stock query
query value SELECT COUNT(safety_stock_level - days_to_manufacture) FROM dim_product


Once defined in your checks YAML file, you can reference a user-defined metric in other checks in the same file. The user defined metric must exist in the file above other checks that reference it.

The following example includes a second check that references the custom metric defined above it.

checks for dim_product:
  - product_stock >= 50:
      product_stock query: |
        SELECT COUNT(safety_stock_level - days_to_manufacture)
        FROM dim_product
  - anomaly score for product_stock < default


Optional check configurations

Supported Configuration Documentation
Define a name for a user-defined check; see example. Customize check names
Add an identity to a check. Add a check identity
Define alert configurations to specify warn and fail alert conditions; see example. Add alert configurations
  Apply an in-check filter to return results for a specific portion of the data in your dataset. -
Use quotes when identifying dataset or column names; see example.
Note that the type of quotes you use must match that which your data source uses. For example, BigQuery uses a backtick (`) as a quotation mark.
Use quotes in a check
Use wildcard characters in the value in the check. Use wildcard values as you would with CTE or SQL.
Use for each to apply user-defined checks to multiple datasets in one scan; see example. Apply checks to multiple datasets
Apply a dataset filter to partition data during a scan; see example.
Known issue: Dataset filters are not compatible with user-defined checks which use a SQL query. With such a check, Soda does not apply the dataset filter at scan time.
Scan a portion of your dataset

Example with check name

checks for dim_product:
  - product_stock >= 50:
      name: Product stock 
      product_stock query: |
        SELECT COUNT(safety_stock_level - days_to_manufacture)
        FROM dim_product

Example with alert configuration

  - avg_order_span:
      avg_order_span expression: AVG(last_order_year - first_order_year)
      warn: when > 50
      fail: when > 200

Example with quotes

checks for dim_product:
  - product_stock >= 50:
      product_stock query: |
        SELECT COUNT("safety_stock_level" - "days_to_manufacture")
        FROM dim_product

Example with for each

for each dataset T:
  datasets:
    - dim_reseller
  checks:
    - avg_order_span between 5 and 10:
        avg_order_span expression: AVG(last_order_year - first_order_year)

Example with dataset filter

filter FULFILLMENT [daily]:
  where: TIMESTAMP '{ts_start}' <= "ts" AND "ts" < TIMESTAMP '${ts_end}'

checks for FULFILLMENT [daily]:
  - avg_order_span between 5 and 10:
      avg_order_span expression: AVG(last_order_day - first_order_day)


List of comparison symbols and phrases

 = 
 < 
 >
 <=
 >=
 !=
 <> 
 between 
 not between 

Go further


Was this documentation helpful?

What could we do to improve this page?


Last modified on 26-Jan-23