Troubleshoot Soda SQL
Install and upgrade Soda SQL
Problem: There are known issues on Soda SQL when using pip version 19.
Solution: Upgrade pip
to version 20 or greater using the following command:
$ pip install --upgrade pip
Problem: Upgrading Soda SQL does not seem to work.
Solution: Run the following command to skip your local cache when upgrading your Soda SQL version:
$ pip install --upgrade --no-cache-dir soda-sql-yourdatasource
Problem: I can’t run the soda
command in my CLI. It returns command not found: soda
.
Solution: If you followed the instructions to install Soda SQL and still received the error, you may need to adjust your $PATH
variable.
- Run the following command to find the path to your installation of Python, replacing
soda-sql-postgresql
with the install package that matches the type of warehouse you use if not PostgreSQL:
pip show soda-sql-postgresql
The output indicates the Location that looks something like this example:... Location: /Users/yourname/Library/Python/3.8/lib/python/site-packages ...
- Add the location to your
$PATH
variable using theexport PATH
command as follows:
'export PATH=$PATH:/Users/yourname/Library/Python/3.8/bin soda'
- Run the
soda
command again to receive the following output:
Usage: soda [OPTIONS] COMMAND [ARGS]... Soda CLI version 2.1.xxx Options: --help Show this message and exit. Commands: analyze Analyze tables and scaffold SCAN YAML create Create a template warehouse.yml file ingest Ingest test information from different tools scan Compute metrics and run tests for a given table
Problem: When you run soda analyze
you get an an authentication error.
Solution: Check to see if you have another instance of Postgres already running on port 5432. If so, try stopping or uninstalling the Postgres instance, then run soda analyze
again.
Scans and tests with Soda SQL
Problem: Soda SQL scans produce errors in the CLI.
Solution: Check your warehouse.yml
, env_vars.yml
, and scan YAML files for proper spacing, indentation, and verbiage. See Warehouse YAML and Scan YAML.
Problem: When I run a scan, I get this error.
UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 431: ordinal not in range(256)
Solution: Soda SQL does not support scans of tables using Latin-1 encoding. Adjust the tables to UTF-8 encoding to run a scan.
Problem: When using an historic metric, you get an scan error message that advises you that there are insufficient measurements to complete the scan.
Solution: The Cloud Metric Store does not contain enough historic measurements to execute the test you have defined. For example, if you defined a test to count back to the seventh historic measurement but your Cloud Metric Store only contains three historic measurements, Soda SQL cannot complete the scan. Consider lowering the count-back number in your test, then run the scan again.
Data source connections with Soda SQL
Problem: I get errors in the CLI when I run soda analyze
on my MS SQL server data source.
Solution: Connecting Soda SQL to MS SQL servers is still in experimental phase. You may encounter errors before this data source connection type is stabilized.
Problem: I use Amazon Athena and I’m having trouble connecting Soda SQL to data in an S3 bucket.
Solution: If you have followed the instructions to configure your warehouse YAML and have added your AWS access key ID and secret access key to your env_vars YAML file but are still not connecting, you may need to adjust the S3 bucket setup and user profile permissions.
When you connect Soda SQL to Athena to run scans on data in an S3 bucket, Soda SQL uses PyAthena (the Python DB API client for Amazon Athena) to output its scan results to the staging directory in the same S3 bucket. As such, the user profile that Soda SQL uses to connect to Athena (the Athena access keys you configured in your env_vars YAML files) must have write permission to the staging directory. If the user profile doesn’t have the correct write permissions, the issue manifests in Soda SQL as a connection error. Instead, as best practice, separate the buckets.
- Create a new Athena account and restrict the user profile with a policy that specifies that it can only write to the S3 bucket with the staging directory.
- Create a new S3 bucket for your staging directory, then configure a separate warehouse YAML to access the staging directory. For example:
name: athena-query-results connection: type: athena catalog: AwsDataCatalog database: test access_key_id: env_var(AWS_ACCESS_KEY_ID) secret_access_key: env_var(AWS_SECRET_ACCESS_KEY) role_arn: region: eu-west-1 staging_dir: <YOUR STAGING PATH IN AWS S3> ...
However, if you do not want to create a new S3 bucket for the staging directory, you can adjust the existing settings for the user profile you use to connect Soda SQL to Athena to include the following actions:
GetBucketLocation
ListAllMyBuckets
ListBucket
GetObject
ListBucketMultipartUploads
ListMultipartUploadParts
AbortMultipartUpload
PutObject
Soda Cloud
Problem: You open the monitor whose test failed during a scan but cannot click the Failed Rows tab.
Solution: Click a failed data point in the chart that shows the monitor’s scan results over time. This action identifies the specific set of failed rows associated with an individual scan result so it can display the failed rows associated with that individual scan.
Last modified on 10-Aug-22