Connect Soda to Snowflake
Access configuration details to connect Soda to a Snowflake data source.
For Soda to run quality scans on your data, you must configure it to connect to your data source. To learn how to set up Soda and configure it to connect to your data sources, see Get started.
Connection configuration reference
Install package: soda-snowflake
Core connection:
data_source my_datasource_name:
type: snowflake
# Core
account: ${SNOWFLAKE_ACCOUNT}
database: <database>
schema: <schema>
warehouse: <warehouse>
role: <role> # optional, strongly recommended
connection_timeout: 240 # optional
client_session_keep_alive: false # optional
Choose exactly one authentication approach of the four options below.
1. Username + password
# 1) Username + password
username: ${SNOWFLAKE_USER}
password: ${SNOWFLAKE_PASSWORD}
2. External browser SSO
# 2) External browser SSO (SAML 2.0 IdP such as Okta/OneLogin)
authenticator: externalbrowser
3. Private key authentication
You can use the private_key
and private_key_passphrase
parameters to specify for key pair authentication. In you configuration YML file, add the parameters as per the following example.
# 3) Key pair (JWT) authentication
authenticator: SNOWFLAKE_JWT
username: <user>
private_key: |
-----BEGIN ENCRYPTED PRIVATE KEY-----
-----END ENCRYPTED PRIVATE KEY-----
private_key_passphrase: ${SNOWFLAKE_PASSPHRASE}
# or use a file path instead of inline key:
private_key_path: /path/to/private-key.pk8
4. OAuth 2.0 Client Credentials (NEW)
# 4) OAuth 2.0 Client Credentials
authenticator: OAUTH_CLIENT_CREDENTIALS
oauth_client_id: <client-id>
oauth_client_secret: <client-secret>
oauth_token_request_url: https://<idp>/oauth/token
oauth_scope: "scope1 scope2" # space-delimited; optional if derived by role
Other parameters
# Optional Snowflake session parameters (passed through)
session_parameters:
QUERY_TAG: soda-queries
QUOTED_IDENTIFIERS_IGNORE_CASE: false
# Optional proxies (for agents behind proxy)
proxy_http: http://host:port
proxy_https: https://host:port
Use a values file to store private key authentication values
If you use a private key authentication with Snowflake and have deployed a Soda Agent, you can provide the required private key values in a values.yml
file when you deploy or redeploy the agent.
You can also use the values.yml
file to store other environment variables for the Soda Agent to use, such as SNOFLAKE_USER
, SNOWFLAKE_ACCOUNT
, SNOWFLAKE_PASSPHRASE
, etc.
First, run the following command to create a local path to the Snowflake private key. Replace the
local path to the Snowflake private key
with your own value.
kubectl create secret generic -n <soda-agent-namespace> snowflake-private-key --from-file=snowflake-private-key.pk8=<local path to the Snowflake private key>
Then, add the following to the your
values.yml
file, adjusting the values to your own specific details.soda: scanlauncher: volumeMounts: - name: snowflake-private-key mountPath: /opt/soda/etc volumes: - name: snowflake-private-key secret: secretName: snowflake-private-key items: - key: snowflake-private-key.pk8 path: snowflake-private-key.pk8
Adjust the
configuration.yml
file to include the new path in the connection details, as in the following example.data_source ltsnowflakecustomer: type: snowflake username: ${SNOWFLAKE_USER} password: password account: ${SNOWFLAKE_ACCOUNT} database: PUBLISH_DEV warehouse: ${SNOWFLAKE_WAREHOUSE} role: ${SNOWFLAKE_ROLE} client_session_keep_alive: true session_parameters: QUERY_TAG: soda-queries QUOTED_IDENTIFIERS_IGNORE_CASE: false schema: CUSTOMER private_key_passphrase: ${SNOWFLAKE_PASSPHRASE} private_key_path: /opt/soda/etc/snowflake-private-key.pk8
Deploy, or redeploy, the agent for the changes to take effect.
Test the data source connection
To confirm that you have correctly configured the connection details for the data source(s) in your configuration YAML file, use the test-connection
command. If you wish, add a -V
option to the command to return results in verbose mode in the CLI.
soda test-connection -d my_datasource -c configuration.yml -V
Supported data types
text
CHAR, VARCHAR, CHARACTER, STRING, TEXT
number
NUMBER, INT, INTEGER, BIGINT, SMALLINT, TINYINT, BYTEINT, FLOAT, FLOAT4, FLOAT8, DOUBLE, DOUBLE PRECISION, REAL
time
DATE, DATETIME, TIME, TIMESTAMP, TIMESTAMPT_LTZ, TIMESTAMP_NTZ, TIMESTAMP_TZ
Troubleshoot
Problem: When testing the connection to your Snowflake data source, Snowflake returns an error message about using the use database
command.
Solution: Use the role
parameter to specify a Snowflake role that has permission to access the database
and schema
of your data source.
Though optional, best practice dictates that you provide a value for role
. If you do not provide a role, and Snowflake has not assigned a Snowflake System-Defined Role to the user account, Snowflake may, confusingly, deny access to the data source.
Problem: When Soda attempts to connect to your Snowflake data source, it produces a connectivity error that includes something like RunteimError: Command failed with exit code 2: ..... ocsp_response_validation_cash.lock
.
Solution: Use Snowflake's troubleshooting guide to triage OCSP-related connectivity issues.
Problem: You have defined a Group By check and the scan that executes the check yields an error.
Solution: Be aware that, by default, Snowflake returns columns names in uppercase. Therefore, when defining a Group By check, you must specify the custom name of the check in uppercase as in the following example.
checks for VOLUME:
- group by:
group_limit: 50
query: |
SELECT TRADER, count(*) as trader_row_count
FROM TRADEVOLUME
WHERE TRADE_DATE = '2023-01-01'
GROUP BY TRADER
fields:
- TRADER
checks:
- TRADER_ROW_COUNT > 40:
name: Trader row count
Problem: While attempting to connect Soda to a Snowflake data source using proxy parameters, you encounter an error that reads something similar to Could not connect to data source "name_db": 250001 (08001): Failed to connect to DB: mydb.eu-west-1.snowflakecomputing.com:443. Incoming request with IP/Token xx.xxx.xx.xxx is not allowed to access Snowflake.
data_source: my_data_source
type: snowflake
...
session_param:
QUERY_TAG: soda-test
QUOTED_IDENTIFIERS_IGNORE_CASE: false
proxy_http: http://a-proxy-o-dd-dddd-net:8000
proxy_https: https://a-proxy-o-dd-dddd-net:8000
Solution: When connecting to a Snowflake data source by proxy, be sure to set the new proxy environment variables from the command-line using export statements, as in the following example.
export HTTP_PROXY=http://a-proxy-o-dd-dddd-net:8000
export HTTPS_PROXY=https://a-proxy-o-dd-dddd-net:800
Last updated
Was this helpful?