Link Search Menu Expand Document

Run a Soda Core scan

A scan is a command that executes checks to extract information about data in a dataset. A check is a test that Soda Core performs when it scans a dataset in your data source. Soda Core uses the checks you define in the checks YAML file to prepare SQL queries that it runs against the data in a table. Soda Core can execute multiple checks against one or more datasets in a single scan.

Anatomy of a scan command
Scan output
Programmatically use scan output
Add scan options
Go further

Anatomy of a scan command

Each scan requires the following as input:

  • the name of the data source that contains the dataset you wish to scan, identified using the -d option
  • a configuration.yml file, which contains details about how Soda Core can connect to your data source, identified using the -c option
  • a checks.yml file which contains the checks you write using SodaCL

Scan command:

soda scan -d postgres_retail -c configuration.yml checks.yml


Note that you can use the -c option to include multiple configuration.yml files in one scan execution.

Include the filepath of each YAML file if you stored them in a directory other than the one in which you installed Soda Core.

soda scan -d postgres_retail -c other-directory/configuration.yml other-directory/checks.yml


Use the soda soda scan --help command to review options you can include to customize the scan.

Scan output

As a result of a scan, each check results in one of three default states:

  • pass: the values in the dataset match or fall within the thresholds you specified
  • fail: the values in the dataset do not match or fall within the thresholds you specified
  • error: the syntax of the check is invalid

A fourth state, warn, is something you can explicitly configure for individual checks. See Add alert configurations.

The scan results appear in your command-line interface (CLI) and, if you have connected Soda Core to a Soda Cloud account, in the Monitors Results dashboard in the Soda Cloud web application.

Soda Core 3.0.0bx
Scan summary:
1/1 check PASSED: 
    dim_customer in adventureworks
      row_count > 0 [PASSED]
All is good. No failures. No warnings. No errors.
Sending results to Soda Cloud

check-result

Example output with a check that triggered a warning:

Soda Core 0.0.x
Scan summary:
1/1 check WARNED: 
    CUSTOMERS in postgres_retail
      schema [WARNED]
        missing_column_names = [sombrero]
        schema_measured = [geography_key, customer_alternate_key, title, first_name, last_name ...]
Only 1 warning. 0 failure. 0 errors. 0 pass.

Example output with a check that failed:

Soda Core 0.0.x
Scan summary:
1/1 check FAILED: 
    CUSTOMERS in postgres_retail
      freshness(full_date_alternate_key) < 3d [FAILED]
        max_column_timestamp: 2020-06-24 00:04:10+00:00
        max_column_timestamp_utc: 2020-06-24 00:04:10+00:00
        now_variable_name: NOW
        now_timestamp: 2022-03-10T16:30:12.608845
        now_timestamp_utc: 2022-03-10 16:30:12.608845+00:00
        freshness: 624 days, 16:26:02.608845
Oops! 1 failures. 0 warnings. 0 errors. 0 pass.

Programmatically use scan output

Optionally, you can insert the output of Soda Core scans into your data orchestration tool such as Dagster, or Apache Airflow.

You can save Soda Core scan results anywhere in your system; the scan_result object contains all the scan result information. To import the Soda Core library in Python so you can utilize the Scan() object, install a Soda Core package, then use from soda.scan import Scan. Refer to Define programmatic scans for details.

Further, in your orchestration tool, you can use Soda Core scan results to block the data pipeline if it encounters bad data, or to run in parallel to surface issues with your data. Learn how to Configure orchestrated scans.

Add scan options

When you run a scan in Soda Cre, you can specify some options that modify the scan actions or output. Add one or more of the following options to a soda scan command.

Option Description and example
-c TEXT or
--configuration TEXT
(Required) Use this option to specify the file path and file name for the configuration YAML file.
-d TEXT or
--data-source TEXT
(Required) Use this option to specify the data source that contains the datasets you wish to scan.
-s TEXT or
--scan-definition TEXT
 
-v TEXT or
--variable TEXT
Replace TEXT with variables you wish to apply to the scan, such as a filter for a date. Put single or double quotes around any value with spaces.
soda scan -d my_datasource -v start=2020-04-12 -c configuration.yml checks.yml
V or
--verbose
Return scan output in verbose mode to review query details.

Go further


Last modified on 01-Jul-22

Was this documentation helpful?
Share feedback in the Soda community on Slack.

Help improve our docs!

  • Request a docs change.
  • Edit this page in our GitHub repo.