Link Search Menu Expand Document

Install Soda SQL

Soda SQL is a command-line interface (CLI) tool that enables you to scan the data in your data source to surface invalid, missing, or unexpected data.

Compatibility
Requirements
Install
Upgrade
Troubleshoot
Go further

Compatibility

Use Soda SQL to scan a variety of data sources:

Amazon Athena
Amazon Redshift
Apache Hive (experimental)
Apache Spark (experimental)
GCP Big Query
MySQL (experimental)
Microsoft SQL Server (experimental)
PostgreSQL
Snowflake

Requirements

To use Soda SQL, you must have installed the following on your system.

  • Python 3.7 or greater. To check your existing version, use the CLI command: python --version
  • Pip 21.0 or greater. To check your existing version, use the CLI command: pip --version

For Linux users only, install the following:

  • On Debian Buster: apt-get install g++ unixodbc-dev python3-dev libssl-dev libffi-dev
  • On CentOS 8: yum install gcc-c++ unixODBC-devel python38-devel libffi-devel openssl-devel

For MSSQL Server users only, install the following:

Install

From your command-line interface tool, execute the following command, replacing soda-sql-athena with the install package that matches the type of data source you use to store data.

$ pip install soda-sql-athena
Data source Install package
Amazon Athena soda-sql-athena
Amazon Redshift soda-sql-redshift
Apache Hive soda-sql-hive
Apache Spark (experimental) soda-sql-spark
GCP Big Query soda-sql-bigquery
MS SQL Server (experimental) soda-sql-sqlserver
MySQL (experimental) soda-sql-mysql
PostgreSQL soda-sql-postgresql
Snowflake soda-sql-snowflake

Optionally, you can install Soda SQL in a virtual environment. Execute the following commands one by one:

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install soda-sql-yourdatasource

To deactivate the virtual environment, use the command: deactivate.

Upgrade

To upgrade your existing Soda SQL tool to the latest version, use the following command replacing soda-sql-athena with the install package that matches the type of data source you are using.

pip install soda-sql-athena -U

Troubleshoot

Problem: There are known issues on Soda SQL when using pip version 19.
Solution: Upgrade pip to version 20 or greater using the following command:

$ pip install --upgrade pip


Problem: Upgrading Soda SQL does not seem to work.
Solution: Run the following command to skip your local cache when upgrading your Soda SQL version:

$ pip install --upgrade --no-cache-dir soda-sql-yourdatasource


Problem: I can’t run the soda command in my CLI. It returns command not found: soda.
Solution: If you followed the instructions to install Soda SQL and still received the error, you may need to adjust your $PATH variable.

  1. Run the following command to find the path to your installation of Python, replacing soda-sql-postgresql with the install package that matches the type of warehouse you use if not PostgreSQL:
    pip show soda-sql-postgresql

    The output indicates the Location that looks something like this example:
    ...
    Location: /Users/yourname/Library/Python/3.8/lib/python/site-packages
    ...
    
  2. Add the location to your $PATH variable using the export PATH command as follows:
    'export PATH=$PATH:/Users/yourname/Library/Python/3.8/bin soda'
  3. Run the soda command again to receive the following output:
    Usage: soda [OPTIONS] COMMAND [ARGS]...
      Soda CLI version 2.1.xxx
    Options:
      --help  Show this message and exit.
    Commands:
      analyze  Analyzes tables in the warehouse and creates scan YAML files...
      create   Creates a new warehouse.yml file and prepares credentials in
            your...
      scan     Computes all measurements and runs all tests on one table.
    

Go further



Last modified on 26-Nov-21

Was this documentation helpful?
Give us your feedback in the #soda-docs channel in the Soda community on Slack or open an issue in GitHub.