Link Search Menu Expand Document

Scan multiple datasets or tables

You can run a single scan against different datasets in your environments. For example, you can run the same scan against data in a warehouse in development environment and data in a warehouse in a production environment.

You can also run a single scan against different tables in your warehouse using custom metrics.

Run a basic scan

When you run a scan, Soda SQL uses the configurations in your scan YAML file to prepare, then run SQL queries against data in your warehouse. The default tests and metrics Soda SQL configured when it created the YAML file focus on finding missing, invalid, or unexpected data in your tables.

Each scan requires the following as input:

  • a warehouse YAML file, which represents a connection to your SQL engine
  • a scan YAML file, including its filepath, which contains the metric and test instructions that Soda SQL uses to scan tables in your warehouse

Example command

$ soda scan warehouse.yml tables/demodata.yml

Scan multiple datasets

To run the same scan against different datasets, proceed as follows.

  1. Prepare one warehouse YAML file for each data warehouse you wish to scan. For example:
    • warehouse_postgres_dev.yml
      name: my_postgres_datawarehouse_dev
      connection:
      type: postgres
      host: localhost
      port: '5432'
      username: env_var(POSTGRES_USERNAME)
      password: env_var(POSTGRES_PASSWORD)
      database: dev
      schema: public
      
    • warehouse_postgres_prod.yml
      name: my_postgres_datawarehouse_prod
      connection:
      type: postgres
      host: dbhost.example.com
      port: '5432'
      username: env_var(POSTGRES_USERNAME)
      password: env_var(POSTGRES_PASSWORD)
      database: prod
      schema: public
      
  2. Prepare a scan YAML file to define all the tests you wish to run against your datasets. See Define tests for details.
  3. Run separate Soda SQL scans against each dataset by specifying which warehouse YAML to scan and using the same scan YAML file. For example:
    soda scan warehouse_postgres_dev.yml tables/my_table_scan.yml 
    soda scan warehouse_postgres_prod.yml tables/my_table_scan.yml
    

Scan multiple tables

Use a single scan YAML file to run tests on different tables in your warehouse.

Prepare one scan YAML file to define the tests you wish to apply against multiple tables. Use custom metrics to write SQL queries and subqueries that run against multiple tables. When you run a scan, Soda SQL uses your SQL queries to query data in the tables you specified in your scan YAML file.

Example coming soon.

Go further