Link Search Menu Expand Document

Soda Core scan reference

A scan is a command that executes checks to extract information about data in a table.

Soda Core uses the input in the checks YAML file to prepare SQL queries that it runs against the data in a table. A single scan can execute checks against multiple datasets in a data source.

Anatomy of a scan command
Variables
Scan output

Anatomy of a scan command

Each scan requires the following as input:

  • the name of the data source that contains the datasets of data you wish to scan, identified using the -d option
  • a configuration.yml file, which contains details about how Soda Core can connect to your data source, identified using the -c option
  • a checks.yml file, including its filepath if stored in a different directory, which contains the checks you write using SodaCL

Scan command:

soda scan -d postgres_retail checks.yml

Scan command with explicit configuration.yml path:

soda scan -d postgres_retail -c configuration.yml checks.yml


Use the soda soda scan --help command to review options you can include to customize the scan.

Variables

To test specific portions of data, such as data pertaining to a specific date, you can apply dynamic variables when you scan data in your warehouse. See Use variables in Soda Core for detailed instructions.

Variables are a set of key-value pairs, both of which are strings. In SodaCL, you can refer to variables as ${VAR}.

Soda checks YAML file:

variables:
  hello: world
  sometime_later: ${now}

Scan command:

soda scan -d postgres_retail -v TODAY=2022-03-11 checks.yml

Scan output

During a scan, all checks return a status that represents the result of each check: pass, fail, warn, or error.

  • If a check passes, you know your data is sound.
  • If a check fails, it means the scan discovered data that falls outside the expected or acceptable parameters you defined in your check.
  • If a check triggers a warning, it means the data falls within the parameters you defined as “worthy of a warning” in your check.
  • If a check returns an error, it means there is a problem with the check itself, such as a syntax error.

Example output with a check that passed:

Soda Core 3.0.0bx
Scan summary:
1/1 check PASSED: 
    CUSTOMERS in postgres_retail
      row_count > 0 [PASSED]
All is good. No failures. No warnings. No errors.

Example output with a check that triggered a warning:

Soda Core 0.0.x
Scan summary:
1/1 check WARNED: 
    CUSTOMERS in postgres_retail
      schema [WARNED]
        missing_column_names = [sombrero]
        schema_measured = [geography_key, customer_alternate_key, title, first_name, last_name ...]
Only 1 warning. 0 failure. 0 errors. 0 pass.

Example output with a check that failed:

Soda Core 0.0.x
Scan summary:
1/1 check FAILED: 
    CUSTOMERS in postgres_retail
      freshness (full_date_alternate_key) < 3d [FAILED]
        max_column_timestamp: 2020-06-24 00:04:10+00:00
        max_column_timestamp_utc: 2020-06-24 00:04:10+00:00
        now_variable_name: NOW
        now_timestamp: 2022-03-10T16:30:12.608845
        now_timestamp_utc: 2022-03-10 16:30:12.608845+00:00
        freshness: 624 days, 16:26:02.608845
Oops! 1 failures. 0 warnings. 0 errors. 0 pass.


Last modified on 01-Jul-22

Was this documentation helpful?
Share feedback in the #soda-core channel in the Soda community on Slack.