Soda SQL is a command-line interface (CLI) tool that enables you to scan the data in your data source to surface invalid, missing, or unexpected data.
Use Soda SQL to scan a variety of data sources:
|Amazon Athena |
Apache Hive (experimental)
Apache Spark (experimental)
GCP Big Query
| MySQL (experimental) |
Microsoft SQL Server (experimental)
To use Soda SQL, you must have installed the following on your system.
- Python 3.7 or greater. To check your existing version, use the CLI command:
- Pip 21.0 or greater. To check your existing version, use the CLI command:
For Linux users only, install the following:
- On Debian Buster:
apt-get install g++ unixodbc-dev python3-dev libssl-dev libffi-dev
- On CentOS 8:
yum install gcc-c++ unixODBC-devel python38-devel libffi-devel openssl-devel
For MSSQL Server users only, install the following:
From your command-line interface tool, execute the following command, replacing
soda-sql-athena with the install package that matches the type of data source you use to store data.
$ pip install soda-sql-athena
|Data source||Install package|
|Apache Spark (experimental)||soda-sql-spark|
|GCP Big Query||soda-sql-bigquery|
|MS SQL Server (experimental)||soda-sql-sqlserver|
Optionally, you can install Soda SQL in a virtual environment. Execute the following commands one by one:
python3 -m venv .venv source .venv/bin/activate pip install --upgrade pip pip install soda-sql-yourdatasource
To deactivate the virtual environment, use the command:
To upgrade your existing Soda SQL tool to the latest version, use the following command replacing
soda-sql-athena with the install package that matches the type of data source you are using.
pip install soda-sql-athena -U
Problem: There are known issues on Soda SQL when using pip version 19.
pip to version 20 or greater using the following command:
$ pip install --upgrade pip
Problem: Upgrading Soda SQL does not seem to work.
Solution: Run the following command to skip your local cache when upgrading your Soda SQL version:
$ pip install --upgrade --no-cache-dir soda-sql-yourdatasource
Problem: I can’t run the
soda command in my CLI. It returns
command not found: soda.
Solution: If you followed the instructions to install Soda SQL and still received the error, you may need to adjust your
- Run the following command to find the path to your installation of Python, replacing
soda-sql-postgresqlwith the install package that matches the type of warehouse you use if not PostgreSQL:
pip show soda-sql-postgresql
The output indicates the Location that looks something like this example:
... Location: /Users/yourname/Library/Python/3.8/lib/python/site-packages ...
- Add the location to your
$PATHvariable using the
export PATHcommand as follows:
'export PATH=$PATH:/Users/yourname/Library/Python/3.8/bin soda'
- Run the
sodacommand again to receive the following output:
Usage: soda [OPTIONS] COMMAND [ARGS]... Soda CLI version 2.1.xxx Options: --help Show this message and exit. Commands: analyze Analyzes tables in the warehouse and creates scan YAML files... create Creates a new warehouse.yml file and prepares credentials in your... scan Computes all measurements and runs all tests on one table.
- Next, configure Soda SQL to connect to your warehouse.
- Learn How Soda SQL works.
- Need help? Join the Soda community on Slack.
Last modified on 26-Nov-21