After you install Soda SQL, you must create files and configure a few settings before you can run a scan.
- Create a warehouse directory in which to store your warehouse YAML file and
- Get Soda SQL to create a warehouse YAML file and an env_vars YAML file, then adjust the contents of each to input your data source connection details.
- Get Soda SQL to discover all the datasets in your data source and create a scan YAML file for each dataset. The scan YAML files store the test criteria that Soda SQL uses to prepare SQL queries that scan your data source.
- Adjust the contents of your new scan YAML files to add the tests you want to run on your data to check for quality.
Consider following the Quick start tutorial that guides you through configuration and scanning.
- Use your command-line interface to create, then navigate to a new Soda SQL warehouse directory in your environment. The warehouse directory stores your warehouse YAML files and
/tablesdirectory. The example below creates a directory named
$ mkdir soda_warehouse_directory $ cd soda_warehouse_directory
- Use the data source-specific create command (see list below) to create and pre-populate two files that enable you to configure connection details for Soda SQL to access your data source:
warehouse.ymlfile which stores access details for your data source (read more)
env_vars.ymlfile which securely stores data source login credentials (read more)
soda create --helpfor a list of all available data source types and options.
$ soda create warehousetype -d yourdbname -u dbusername -w soda_warehouse_directory
- Use a code editor to open the
warehouse.ymlfile that Soda SQL created and put in your warehouse directory. Refer to Datasource configuration to adjust the configuration details and authentication settings according to the type of data source you use, then save the file.
Example warehouse YAML
name: soda_warehouse_directory connection: type: postgres host: localhost username: env_var(POSTGRES_USERNAME) password: env_var(POSTGRES_PASSWORD) database: sodasql schema: public
- Use a code editor to open the
env_vars.ymlthat Soda SQL created and put in your local user home directory as a hidden file (
~/.soda/env_vars.yml). Use the command
ls ~/.soda/env_vars.ymlto locate the file. Input your data source login credentials then save the file.
Example env_vars YAML
soda_warehouse_directory: POSTGRES_USERNAME: someusername POSTGRES_PASSWORD: somepassword
- In your command-line interface, use the
soda analyzecommand to get Soda SQL to sift through the contents of your data source and automatically prepare a scan YAML file for each dataset. Soda SQL uses the name of the dataset to name each YAML file which it puts a new
/tablesdirectory in the warehouse directory.
- Use a code editor to open one of your new scan YAML files. Soda SQL pre-populated the YAML file with built-in metrics and tests that it deemed useful for the kind of data in the dataset. See scan YAML.
Adjust the contents of the YAML file to define the tests that you want Soda SQL to conduct when it runs a scan on this dataset in your data source. Refer to Metrics and Tests for details.
Example scan YAML
- With your configuration complete, run your first scan.
soda create --help for a list of all available data source types and options.
|Amazon Athena||soda create athena|
|Amazon Redshift||soda create redshift|
|Apache Hive||soda create hive|
|GCP BigQuery||soda create bigquery|
|MS SQL Server||soda create sqlserver|
|PostgreSQL||soda create postgres|
|Snowflake||soda create snowflake|
- Next, run a scan on the data in your data source.
- Learn more about the scan YAML file.
- Learn more about the warehouse YAML file.
- Learn more about How Soda SQL works.
- Learn more about configuring tests and metrics.
- Need help? Join the Soda community on Slack.
Last modified on 16-Jul-21