Link Search Menu Expand Document


Need help? Join the Soda community on Slack.

Install and upgrade

Problem: There are known issues on Soda SQL when using pip version 19.
Solution: Upgrade pip to version 20 or greater using the following command:

$ pip install --upgrade pip

Problem: Upgrading Soda SQL does not seem to work.
Solution: Run the following command to skip your local cache when upgrading your Soda SQL version:

$ pip install --upgrade --no-cache-dir soda-sql-yourdatasource

Problem: I can’t run the soda command in my CLI. It returns command not found: soda.
Solution: If you followed the instructions to install Soda SQL and still received the error, you may need to adjust your $PATH variable.

  1. Run the following command to find the path to your installation of Python, replacing soda-sql-postgresql with the install package that matches the type of warehouse you use if not PostgreSQL:
    pip show soda-sql-postgresql

    The output indicates the Location that looks something like this example:
    Location: /Users/yourname/Library/Python/3.8/lib/python/site-packages
  2. Add the location to your $PATH variable using the export PATH command as follows:
    'export PATH=$PATH:/Users/yourname/Library/Python/3.8/bin soda'
  3. Run the soda command again to receive the following output:
    Usage: soda [OPTIONS] COMMAND [ARGS]...
      Soda CLI version
      --help  Show this message and exit.
      analyze  Analyzes tables in the warehouse and creates scan YAML files...
      create   Creates a new warehouse.yml file and prepares credentials in
      scan     Computes all measurements and runs all tests on one table.

Scans and tests

Problem: Soda SQL scans produce errors in the CLI.
Solution: Check your warehouse.yml, env_vars.yml, and scan YAML files for proper spacing, indentation, and verbiage. See Warehouse YAML and Scan YAML.

Problem: When I run a scan, I get this error.

UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 431: ordinal not in range(256)

Solution: Soda SQL does not support scans of tables using Latin-1 encoding. Adjust the tables to UTF-8 encoding to run a scan.

Data source connections

Problem: I get errors in the CLI when I run soda analyze on my MS SQL server data source.
Solution: Connecting Soda SQL to MS SQL servers is still in experimental phase. You may encounter errors before this data source connection type is stabilized.

Problem: I use Amazon Athena and I’m having trouble connecting Soda SQL to data in an S3 bucket.
Solution: If you have followed the instructions to configure your warehouse YAML and have added your AWS access key ID and secret access key to your env_vars YAML file but are still not connecting, you may need to adjust the S3 bucket setup and user profile permissions.

When you connect Soda SQL to Athena to run scans on data in an S3 bucket, Soda SQL uses PyAthena (the Python DB API client for Amazon Athena) to output its scan results to the staging directory in the same S3 bucket. As such, the user profile that Soda SQL uses to connect to Athena (the Athena access keys you configured in your env_vars YAML files) must have write permission to the staging directory. If the user profile doesn’t have the correct write permissions, the issue manifests in Soda SQL as a connection error. Instead, as best practice, separate the buckets.

  1. Create a new Athena account and restrict the user profile with a policy that specifies that it can only write to the S3 bucket with the staging directory.
  2. Create a new S3 bucket for your staging directory, then configure a separate warehouse YAML to access the staging directory. For example:
    name: athena-query-results
     type: athena
     catalog: AwsDataCatalog
     database: test
     access_key_id: env_var(AWS_ACCESS_KEY_ID)
     secret_access_key: env_var(AWS_SECRET_ACCESS_KEY)
     region: eu-west-1
     staging_dir: <YOUR STAGING PATH IN AWS S3>

However, if you do not want to create a new S3 bucket for the staging directory, you can adjust the existing settings for the user profile you use to connect Soda SQL to Athena to include the following actions:

  • GetBucketLocation
  • ListAllMyBuckets
  • ListBucket
  • GetObject
  • ListBucketMultipartUploads
  • ListMultipartUploadParts
  • AbortMultipartUpload
  • PutObject

Last modified on 18-Oct-21

Was this documentation helpful?
Give us your feedback in the #soda-docs channel in the Soda community on Slack or open an issue in GitHub.