A test is a check that Soda SQL performs when it scans a table in your warehouse. Technically, it is a Python expression that, during a Soda SQL scan, checks metrics to see if they match the parameters defined for a measurement. A single Soda SQL scan runs against a single table in your database, but each scan can run multiple tests against multiple columns.
As a result of a scan, each test either passes or fails. When a test fails, it means that a property of the data in your table did not match the test parameters you defined. In other words, any test that returns
true during a Soda SQL scan passes; any test that returns
The scan results appear in your command-line interface (CLI). The results include an exit code which is an indicator of the test results:
0 means all tests passed; a non-zero value means one or more tests have failed. See Scan output for details.
Soda Cloud refers to tests as monitors. Refer to Create monitors and alerts to learn how to define a monitor using Soda Cloud.
You define your tests in your scan YAML file which is associated with a specific table in your database. You can write tests using default metrics that Soda SQL applies to an entire table (table metrics), or to individual columns you identify (column metrics). You can also write tests using custom metrics (SQL metrics) that you can apply to an entire table or individual columns. See example tests that use each default metric.
Regardless of where it applies, each test is generally comprised of three parts:
- a metric (a property of the data in your database)
- a comparison operator
- a value
tests: - row_count > 0
At times, a test may only include one part, a metric, that simply returns a calculated value. For example, you may wish to write a test that simply returns the calculated sum of the values in a numeric column.
columns: commission_paid: tests: - sum
However, where a test must determine whether or not data is valid, you must add a fourth element, a column configuration key to define what qualifies as valid. In the scan YAML file, you define a column configuration key before the test that will use the definition of “valid”.
In the example below, the user defined the
date_eu or dd/mm/yyyy format. The metric
invalid_percentage refers to the
valid_format configuration key to determine if the data in the column is valid. Note that
valid_format applies only to columns with data type TEXT. Refer to Data types for details. To see a list of all available column configuration keys, see Column Metrics.
columns: start_date: valid_format: date_eu tests: - invalid_percentage < 2.0
See example tests that use each default metric.
Reference the table below which corresponds to the following example scan YAML file. Both the
feepct columns are of data type TEXT, enabling the user to define a
valid_format for the contents of the columns. See Valid format for details.
table_name: demodata metrics: - row_count - missing_count - missing_percentage - values_count - ... tests: - row_count > 0 columns: id: valid_format: uuid tests: - invalid_percentage == 0 feepct: valid_format: number_percentage tests: - invalid_percentage == 0
|Default metric||Comparison operator||Value||Applies to||Test|
| || || ||whole table||Checks to see if the table has at least one row. If the test fails, it means the table has no rows, which means that the table is empty.|
| || || || ||Checks to see if all rows in the id column contain data in a valid format. If the test fails, it means that more than 0% of the rows contain invalid data, which is data that is in non-UUID format.|
| || || || ||Checks to see if all rows in the |
See example tests that use each default metric.
If the default set of table and column metrics that Soda SQL offers do not quite give you the information you need from a scan, you can use SQL metrics to customize your queries. SQL metrics essentially enable you to add SQL queries to your scan YAML file so that Soda SQL runs them during a scan. See SQL metrics
Reference the table below which corresponds to the following example scan YAML file.
table_name: yourtable sql_metrics: - sql: | SELECT sum(volume) as total_volume_us FROM CUSTOMER_TRANSACTIONS WHERE country = 'US' tests: - total_volume_us > 5000
|Custom metric||Comparison operator||Value||Applies to||Test|
| || || ||whole table||Checks to see if the sum of all customer transactions in the United States exceeds |
Soda SQL can run both anonymous or named tests. Named tests are useful if you intend to push Soda SQL scan results to your Soda Cloud account where you can update a test and retain its test history.
Example of an anonymous test
tests: - total_volume_us > 5000
Examples of named tests
tests: volume_test_max: total_volume_us > 3000 volume_test_min: total_volume_us < 5000