Connect Soda to Amazon Redshift
Access configuration details to connect Soda to a Redshift data source.
For Soda to run quality scans on your data, you must configure it to connect to your data source. To learn how to set up Soda and configure it to connect to your data sources, see Get started.
Connection configuration reference
Install package: soda-redshift
data_source my_datasource_name:
type: redshift
host: 127.0.0.1
username: simple
password: simple_pass
database: soda
schema: public
access_key_id: ${KEY_ID}
secret_access_key: ${ACCESS_KEY}
role_arn: arn:aws:ec2:us-east-1:123456789012:instance/i-012abcd34exxx56
region: us-east-1
type
required
Identify the type of data source for Soda.
host
required
Provide a host identifier.
username
required
If you provide a value for username
and password
, the connection ignores cluster credentials.
password
required
As above.
database
required
Provide an idenfier for your database.
schema
required
Provide an identifier for the schema in which your dataset exists.
access_key_id
required 1
Consider using system variables to retrieve this value securely.
secret_access_key
required 1
Consider using system variables to retrieve this value securely.
role_arn
optional 1
Provide an Amazon Resource Name, which is a string that identifies an AWS resource such as an S3 bucket or EC2 instance. Learn how to find your arn.
region
optional
Provide an identifier for your geographic area.
session_token
optional
Add a session Token to use for authentication and authorization.
profile_name
optional
Specify the profile Name from local AWS configuration to use for authentication and authorization.
1 access_key_id
and secret_access_key
are required parameters to obtain an authentication token from Amazon Athena or Redshift. You can provide these key values in the configuration file or as environment variables.
You may add the optional role_arn
parameter which first authenticates with the access keys, then uses the role to access temporary tokens that allow for authentication. Depending on your Athena or Redshift setup, you may be able to use only the role_arn
to authenticate, though Athena still must access the keys from a config file or environment variables. See AWS Boto3 documentation for details on the progressive steps it takes to access the credentials it needs to authenticate.
Some users who access their Athena or Redshift data source via a self-hosted Soda Agent deployed in a Kubernetes cluster have reported that they can use IAM roles for Service Accounts to authenticatate, as long as the IAM role that the Kubernetes pod has from the Kubernetes Service Account has the permissions to access Athena or Redshift. See Enable IAM Roles for Service Accounts (IRSA) on the EKS cluster.
Test the data source connection
To confirm that you have correctly configured the connection details for the data source(s) in your configuration YAML file, use the test-connection
command. If you wish, add a -V
option to the command to return results in verbose mode in the CLI.
soda test-connection -d my_datasource -c configuration.yml -V
Supported data types
text
CHARACTER VARYING, CHARACTER, CHAR, TEXT, NCHAR, NVARCHAR, BPCHAR
number
SMALLINT, INT2, INTEGER, INT, INT4, BIGINT, INT8
time
DATE, TIME, TIMETZ, TIMESTAMP, TIMESTAMPTZ
Last updated
Was this helpful?