A scan is a command that executes checks to extract information about data in a dataset. A check is a test that Soda Core performs when it scans a dataset in your data source. Soda Core uses the checks you define in the checks YAML file to prepare SQL queries that it runs against the data in a table. Soda Core can execute multiple checks against one or more datasets in a single scan.
Each scan requires the following as input:
- the name of the data source that contains the dataset you wish to scan, identified using the
configuration.ymlfile, which contains details about how Soda Core can connect to your data source, identified using the
checks.ymlfile which contains the checks you write using SodaCL
soda scan -d postgres_retail -c configuration.yml checks.yml
Note that you can use the
-c option to include multiple
configuration.yml files in one scan execution.
Include the filepath of each YAML file if you stored them in a directory other than the one in which you installed Soda Core.
soda scan -d postgres_retail -c other-directory/configuration.yml other-directory/checks.yml
Use the soda
soda scan --help command to review options you can include to customize the scan.
As a result of a scan, each check results in one of three default states:
- pass: the values in the dataset match or fall within the thresholds you specified
- fail: the values in the dataset do not match or fall within the thresholds you specified
- error: the syntax of the check is invalid
A fourth state, warn, is something you can explicitly configure for individual checks. See Add alert configurations.
The scan results appear in your command-line interface (CLI) and, if you have connected Soda Core to a Soda Cloud account, in the Monitors Results dashboard in the Soda Cloud web application.
Soda Core 3.0.0bx Scan summary: 1/1 check PASSED: dim_customer in adventureworks row_count > 0 [PASSED] All is good. No failures. No warnings. No errors. Sending results to Soda Cloud
Example output with a check that triggered a warning:
Soda Core 0.0.x Scan summary: 1/1 check WARNED: CUSTOMERS in postgres_retail schema [WARNED] missing_column_names = [sombrero] schema_measured = [geography_key, customer_alternate_key, title, first_name, last_name ...] Only 1 warning. 0 failure. 0 errors. 0 pass.
Example output with a check that failed:
Soda Core 0.0.x Scan summary: 1/1 check FAILED: CUSTOMERS in postgres_retail freshness(full_date_alternate_key) < 3d [FAILED] max_column_timestamp: 2020-06-24 00:04:10+00:00 max_column_timestamp_utc: 2020-06-24 00:04:10+00:00 now_variable_name: NOW now_timestamp: 2022-03-10T16:30:12.608845 now_timestamp_utc: 2022-03-10 16:30:12.608845+00:00 freshness: 624 days, 16:26:02.608845 Oops! 1 failures. 0 warnings. 0 errors. 0 pass.
Optionally, you can insert the output of Soda Core scans into your data orchestration tool such as Dagster, or Apache Airflow.
You can save Soda Core scan results anywhere in your system; the
scan_result object contains all the scan result information. To import the Soda Core library in Python so you can utilize the
Scan() object, install a Soda Core package, then use
from soda.scan import Scan. Refer to Define programmatic scans for details.
Further, in your orchestration tool, you can use Soda Core scan results to block the data pipeline if it encounters bad data, or to run in parallel to surface issues with your data. Learn how to Configure orchestrated scans.
When you run a scan in Soda Cre, you can specify some options that modify the scan actions or output. Add one or more of the following options to a
soda scan command.
|Option||Description and example|
| ||(Required) Use this option to specify the file path and file name for the configuration YAML file.|
| ||(Required) Use this option to specify the data source that contains the datasets you wish to scan.|
| ||Replace |
| ||Return scan output in verbose mode to review query details.|
- Consider completing the Quick start for SodaCL to learn how to write more checks for data quality.
- Need help? Join the Soda community on Slack.
Last modified on 01-Jul-22
Was this documentation helpful?
Share feedback in the Soda community on Slack.
Help improve our docs!