You can run a single scan against different datasets in your environments. For example, you can run the same scan against data in a warehouse in development environment and data in a warehouse in a production environment.
When you run a scan, Soda SQL uses the configurations in your scan YAML file to prepare, then run SQL queries against data in your warehouse. The default tests and metrics Soda SQL configured when it created the YAML file focus on finding missing, invalid, or unexpected data in your tables.
Each scan requires the following as input:
- a warehouse YAML file, which represents a connection to your SQL engine
- a scan YAML file, including its filepath, which contains the metric and test instructions that Soda SQL uses to scan tables in your warehouse
$ soda scan warehouse.yml tables/demodata.yml
To run the same scan against different datasets, proceed as follows.
- Prepare one warehouse YAML file for each data warehouse you wish to scan. For example:
name: my_postgres_datawarehouse_dev connection: type: postgres host: localhost port: '5432' username: env_var(POSTGRES_USERNAME) password: env_var(POSTGRES_PASSWORD) database: dev schema: public
name: my_postgres_datawarehouse_prod connection: type: postgres host: dbhost.example.com port: '5432' username: env_var(POSTGRES_USERNAME) password: env_var(POSTGRES_PASSWORD) database: prod schema: public
- Prepare a scan YAML file to define all the tests you wish to run against your datasets. See Define tests for details.
- Run separate Soda SQL scans against each dataset by specifying which warehouse YAML to scan and using the same scan YAML file. For example:
soda scan warehouse_postgres_dev.yml tables/my_table_scan.yml soda scan warehouse_postgres_prod.yml tables/my_table_scan.yml
Use a single scan YAML file to run tests on different tables in your warehouse.
Prepare one scan YAML file to define the tests you wish to apply against multiple tables. Use custom metrics to write SQL queries and subqueries that run against multiple tables. When you run a scan, Soda SQL uses your SQL queries to query data in the tables you specified in your scan YAML file.
Example coming soon.