Link Search Menu Expand Document

Metrics

A metric is a property of the data in your database. A measurement is the value for a metric that Soda checks against during a scan. You use metrics to define the tests that Soda executes against your data during a scan.

For example, in the test defined as row_count > 0, row_count is the metric and 0 is the measurement. When it runs a scan, Soda executes the test against your dataset; if the row count is greater than 0, the test passes; if the dataset is empty, the test fails.

  • In Soda SQL, you use metrics to define tests in your scan YAML file. Read more about configuring metrics.
  • In Soda Cloud, you use metrics to define a test as part of the process to create a new monitor. Read more about creating a monitor.

There are three kinds of metrics Soda uses:

  • Dataset metrics for tests that execute against an entire dataset
  • Column metrics for tests that execute against an individual column
  • Custom metrics, also known as SQL metrics, that enable you to define SQL queries that Soda SQL executes against an entire dataset or against individual columns

Dataset metrics

Use dataset metrics in tests that execute against all data in the dataset during a scan.

Dataset metric Description
row_count The number of rows in a dataset.
schema A list of column names in a dataset, and their data types.

Column metrics

Use column metrics in tests that execute against specific columns in a dataset during a scan.

Where a column metric references a valid or invalid value, or a limit, use the metric in conjunction with a column configuration key in Soda SQL or a Validity Rule in Soda Cloud. A Soda scan uses the value of a column configuration key / validity rule to determine if it should pass or fail a test.

Column metric Description Applies to data type Column config key(s) / Validity Rule(s)
avg The calculated average of the values in a numeric column. number -
avg_length The average length of string values in a column. text -
invalid_count The number of rows that contain invalid values. text valid_format
valid_regex
valid_values
valid_min_length
valid_max_length
invalid_percentage The percentage of rows that contain invalid values. text valid_format
valid_regex
valid_values
valid_min_length
valid_max_length
max The greatest value in a numeric column. number -
max_length The maximum length of string values in a column. text -
min The smallest value in a numeric column. number -
min_length The minimum length of string values in a column. text -
missing_count The number of rows in a column that do not contain specific content. text, number, date missing_format
missing_regex
missing_values
missing_percentage The percentage of rows in a column that do not contain specific content. text, number, date missing_format
missing_regex
missing_values
row_count The number of rows in a column. text, number, date -
stddev The calculated standard deviation of values in a numeric column. number -
sum The calculated sum of the values in a numeric column. number -
valid_count The number of rows that contain valid content. text valid_format
valid_regex
valid_values
valid_min_length
valid_max_length
valid_percentage The percentage of rows that contain valid content. text valid_format
valid_regex
valid_values
valid_min_length
valid_max_length
values_count The number of rows that contain content included in a list of valid values. text valid_values
valid_regex
values_percentage The percentage of rows that contain content identified by valid values. text valid_values
valid_regex
variance The calculated variance of the values in a numeric column. number -

Column configuration keys or validity rules

Column config key(s) / Validity Rule(s) Description Values
metric_groups Only available in Soda SQL.
Specifies pre-defined groups of metrics that Soda computes for this column. See Metric groups and dependencies for details.
duplicates
length
missing
profiling
statistics
validity
missing_format Specifies missing values such as whitespace or empty strings.  
missing_regex Use regex expressions to specify your own custom missing values. regex, no forward slash delimiters
missing_values Specifies the values that Soda is to consider missing in list format. values in a list
valid_format Specifies a named valid text format. Can apply only to columns using data type TEXT. See Data types. See Valid format values table.
valid_max Specifies a maximum value for valid values. integer
valid_max_length Specifies a maximum string length for valid values. integer
valid_min Specifies a minimum value for valid values. integer
valid_min_length Specifies a minimum string length for valid values. integer
valid_regex Use regex expressions to specify your own custom valid values. regex, no forward slash delimiters
valid_values Specifies several valid values in list format. values in a list

Valid format values

Valid formats are experimental and subject to change.
Valid formats apply only to columns using data type TEXT. See Data types.

Valid format value
Format
number_whole Number is whole.
number_decimal_point Number uses . as decimal indicator.
number_decimal_comma Number uses , as decimal indicator.
number_percentage Number is a percentage.
number_money_usd Number matches US dollar currency pattern.
number_money_eur Number matches Euro currency pattern.
number_money_gbp Number matches British pound currency pattern.
number_money_rmb Number matches Renminbi yuan currency pattern.
number_money_chf Number matches Swiss franc currency pattern.
number_money Format matches any of the number_money_ patterns.
date_eu Validates date only, not time.
dd/mm/yyyy
date_us Validates date only, not time.
mm/dd/yyyy
date_inverse Validates date only, not time.
yyyy/mm/dd
time 11:59:00,000
11:59:00
11:59
11-59-00,000
23:59:00,000
Noon
1,159
date_iso_8601 Validates date and/or time according to ISO 8601 format .
2021, January 21
October 21, 2015
uuid Universally unique identifier.
ip_address Four whole numbers separated by .
email name@domain.extension
phone_number +12 123 123 1234
123 123 1234
+1 123-123-1234
+12 123-123-1234
+12 123 123-1234
555-2368
555-ABCD
credit_card_number Four four-digit numbers separated by spaces.
Four four-digit numbers separated by dashes.
Sixteen-digit number.
Four five-digit numbers separated by spaces.

Custom metrics

If the built-in dataset and column metrics that Soda offers do not quite give you the information you need from a scan, you can use custom metrics in Soda SQL to customize your queries. Custom metrics, also known as SQL metrics, essentially enable you to add SQL queries to your scan YAML file so that Soda SQL runs them during a scan.

Read more about using custom metrics in Soda SQL.

Go further


Last modified on 16-Jul-21

Was this documentation helpful?
Give us your feedback in the #soda-docs channel in the Soda community on Slack or open an issue in GitHub.