After you install Soda SQL, you must create files and configure a few settings before you can run a scan.
- Create a warehouse directory in which to store your warehouse YAML file and
- Create a warehouse YAML file and an env_vars YAML file, then adjust the contents of each to input your warehouse connection details.
- Create a scan YAML file for each table that exists in your warehouse. The scan YAML files store the test criteria that Soda SQL uses to prepare SQL queries that scan your warehouse.
- Adjust the contents of your new scan YAML files to add the tests you want to run on your data to check for quality.
Consider following the Quick start tutorial that guides you through configuration and scanning.
- Use your command-line interface to create, then navigate to a new Soda SQL warehouse directory in your environment. The warehouse directory stores your warehouse YAML files and
/tablesdirectory. The example below creates a directory named
$ mkdir soda_warehouse_directory $ cd soda_warehouse_directory
- Use the create command to create and pre-populate two files that enable you to configure connection details for Soda SQL to access your warehouse:
warehouse.ymlfile which stores access details for your warehouse (read more)
env_vars.ymlfile which securely stores warehouse login credentials (read more)
soda create --helpfor a list of all available warehouse types and options.
$ soda create warehousetype -d yourdbname -u dbusername -w soda_warehouse_directory
- Use a code editor to open the
warehouse.ymlfile that Soda SQL created and put in your warehouse directory. Refer to Set warehouse configurations to adjust the configuration details according to the type of warehouse you use, then save the file.
Example warehouse YAML
name: soda_warehouse_directory connection: type: postgres host: localhost username: env_var(POSTGRES_USERNAME) password: env_var(POSTGRES_PASSWORD) database: sodasql schema: public
- Use a code editor to open the
env_vars.ymlthat Soda SQL created and put in your local user home directory as a hidden file (
~/.soda/env_vars.yml). Input your warehouse login credentials then save the file.
Example env_vars YAML
soda_warehouse_directory: POSTGRES_USERNAME: someusername POSTGRES_PASSWORD: somepassword
- In your command-line interface, use the
soda analyzecommand to get Soda SQL to sift through the contents of your warehouse and automatically prepare a scan YAML file for each table. Soda SQL uses the name of the table to name each YAML file which it puts a new
/tablesdirectory in the warehouse directory.
- Use a code editor to open one of your new scan YAML files. Soda SQL pre-populated the YAML file with default metrics and tests that it deemed useful for the kind of data in the table. See scan YAML.
Adjust the contents of the YAML file to define the tests that you want Soda SQL to conduct when it runs a scan on this table in your warehouse. Refer to Metrics and Tests for details.
Example scan YAML
- With your configuration complete, run your first scan.
soda create --help for a list of all available warehouse types and options.
|Amazon Athena||soda create athena|
|Amazon Redshift||soda create redshift|
|Apache Hive||soda create hive|
|GCP BigQuery||soda create bigquery|
|MS SQL Server||soda create sqlserver|
|PostgreSQL||soda create postgres|
|Snowflake||soda create snowflake|