Link Search Menu Expand Document

Configure Soda SQL

After you install Soda SQL, you must create files and configure a few settings before you can run a scan.

Overview of configuration

  1. Create a warehouse directory in which to store your warehouse YAML file and /tables directory.
  2. Get Soda SQL to create a warehouse YAML file and an env_vars YAML file, then adjust the contents of each to input your data source connection details.
  3. Get Soda SQL to discover all the datasets in your data source and create a scan YAML file for each dataset. The scan YAML files store the test criteria that Soda SQL uses to prepare SQL queries that scan your data source.
  4. Adjust the contents of your new scan YAML files to add the tests you want to run on your data to check for quality.

Consider following the Quick start tutorial that guides you through configuration and scanning.

Configuration instructions

  1. Use your command-line interface to create, then navigate to a new Soda SQL warehouse directory in your environment. The warehouse directory stores your warehouse YAML files and /tables directory. The example below creates a directory named soda_warehouse_directory.

    $ mkdir soda_warehouse_directory
    $ cd soda_warehouse_directory
  2. Use the data source-specific create command (see list below) to create and pre-populate two files that enable you to configure connection details for Soda SQL to access your data source:
    • a warehouse.yml file which stores access details for your data source (read more)
    • an env_vars.yml file which securely stores data source login credentials (read more)

      Use soda create --help for a list of all available data source types and options.
      $ soda create warehousetype -d yourdbname -u dbusername -w soda_warehouse_directory 
  3. Use a code editor to open the warehouse.yml file that Soda SQL created and put in your warehouse directory. Refer to Datasource configuration to adjust the configuration details and authentication settings according to the type of data source you use, then save the file.

    Example warehouse YAML
    name: soda_warehouse_directory
      type: postgres
      host: localhost
      username: env_var(POSTGRES_USERNAME)
      password: env_var(POSTGRES_PASSWORD)
      database: sodasql
      schema: public
  4. Use a code editor to open the env_vars.yml that Soda SQL created and put in your local user home directory as a hidden file (~/.soda/env_vars.yml). Use the command ls ~/.soda/env_vars.yml to locate the file. Input your data source login credentials then save the file.

    Example env_vars YAML
      POSTGRES_USERNAME: someusername
      POSTGRES_PASSWORD: somepassword
  5. In your command-line interface, use the soda analyze command to get Soda SQL to sift through the contents of your data source and automatically prepare a scan YAML file for each dataset. Soda SQL uses the name of the dataset to name each YAML file which it puts a new /tables directory in the warehouse directory.

    soda analyze
  6. Use a code editor to open one of your new scan YAML files. Soda SQL pre-populated the YAML file with built-in metrics and tests that it deemed useful for the kind of data in the dataset. See scan YAML.
    Adjust the contents of the YAML file to define the tests that you want Soda SQL to conduct when it runs a scan on this dataset in your data source. Refer to Metrics and Tests for details.

    Example scan YAML
    configure yaml
  7. With your configuration complete, run your first scan.

Create commands

Use soda create --help for a list of all available data source types and options.

Warehouse type Command
Amazon Athena soda create athena
Amazon Redshift soda create redshift
Apache Hive soda create hive
GCP BigQuery soda create bigquery
MS SQL Server soda create sqlserver
PostgreSQL soda create postgres
Snowflake soda create snowflake

Go further

Last modified on 16-Jul-21

Was this documentation helpful?
Give us your feedback in the #soda-docs channel in the Soda community on Slack or open an issue in GitHub.