Connect Soda to Amazon Athena
Last modified on 30-Nov-23
For Soda to run quality scans on your data, you must configure it to connect to your data source.
To learn how to set up Soda and configure it to connect to your data sources, see Get started.
Connection configuration reference
Install package: soda-athena
data_source my_datasource_name:
type: athena
access_key_id: kk9gDU6800xxxx
secret_access_key: 88f&eeTuT47xxxx
region_name: eu-west-1
staging_dir: s3://s3-results-bucket/output/
schema: public
Property | Required | Notes |
---|---|---|
type | required | Identify the type of data source for Soda. |
access_key_id | optional 1 | Consider using system variables to retrieve this value securely. See Manage access keys for IAM users. |
secret_access_key | optional 1 | Consider using system variables to retrieve this value securely. |
region_name | optional | The endpoint your AWS account uses. Refer to Amazon Athena endpoints and quotas. |
staging_dir | required | Identify the Amazon S3 Staging Directory (the Query Result Location in AWS); see Specifying a query result location |
schema | required | Identify the schema in the data source in which your tables exist. |
catalog | optional | Identify the name of the Data Source, also referred to as a Catalog. The default value is awsdatacatalog . |
work_group | optional | Identify a non-default workgroup in your region. In your Athena console, access your current workgroup in the Workgroup option on the upper right. Read more about Athena Workgroups. |
1 Access keys and IAM role are mutually exclusive: if you provide values for access_key_id
and secret_access_key
, you cannot use Identity and Access Management role; if you provide value for role_arn
, then you cannot use the access keys. Refer to Identity and Access Management in Athena for details.
Test the data source connection
To confirm that you have correctly configured the connection details for the data source(s) in your configuration YAML file, use the test-connection
command. If you wish, add a -V
option to the command to returns results in verbose mode in the CLI.
soda test-connection -d my_datasource -c configuration.yml -V
Supported data types
Category | Data type |
---|---|
text | CHAR, VARCHAR, STRING |
number | TINYINT, SMALLINT, INT, INTEGER, BIGINT, DOUBLE, FLOAT, DECIMAL |
time | DATE, TIMESTAMP |
Was this documentation helpful?
What could we do to improve this page?
- Suggest a docs change in GitHub.
- Share feedback in the Soda community on Slack.
Documentation always applies to the latest version of Soda products
Last modified on 30-Nov-23