Connect Soda to Amazon Athena

Access configuration details to connect Soda to an Athena data source.

For Soda to run quality scans on your data, you must configure it to connect to your data source. To learn how to set up Soda and configure it to connect to your data sources, see Get started.

Connection configuration reference

Install package: soda-athena

data_source my_datasource_name:
  type: athena
  access_key_id: kk9gDU6800xxxx
  secret_access_key: 88f&eeTuT47xxxx
  region_name: eu-west-1
  staging_dir: s3://s3-results-bucket/output/
  schema: public
Property
Required
Notes

type

required

Identify the type of data source for Soda.

access_key_id

required 1

Consider using system variables to retrieve this value securely. See Manage access keys for IAM users.

secret_access_key

required 1

Consider using system variables to retrieve this value securely. See Manage access keys for IAM users.

region_name

optional

The endpoint your AWS account uses. Refer to Amazon Athena endpoints and quotas.

role_arn

optional 2

Specify role to use for authentication and authorization.

staging_dir

required

Identify the Amazon S3 Staging Directory (the Query Result Location in AWS); see Specifying a query result location

schema

required

Identify the schema in the data source in which your tables exist.

catalog

optional

Identify the name of the Data Source, also referred to as a Catalog. The default value is awsdatacatalog.

work_group

optional

Identify a non-default workgroup in your region. In your Athena console, access your current workgroup in the Workgroup option on the upper right. Read more about Athena Workgroups.

session_token

optional

Add a session Token to use for authentication and authorization.

profile_name

optional

Specify the profile Name from local AWS configuration to use for authentication and authorization.

1 access_key_id and secret_access_key are required parameters to obtain an authentication token from Amazon Athena or Redshift. You can provide these key values in the configuration file or as environment variables.

2You may add the optional role_arn parameter which first authenticates with the access keys, then uses the role to access temporary tokens that allow for authentication. Depending on your Athena or Redshift setup, you may be able to use only the role_arn to authenticate, though Athena still must access the keys from a config file or environment variables. See AWS Boto3 documentation for details on the progressive steps it takes to access the credentials it needs to authenticate.

Some users who access their Athena or Redshift data source via a self-hosted Soda Agent deployed in a Kubernetes cluster have reported that they can use IAM roles for Service Accounts to authenticate, as long as the IAM role that the Kubernetes pod has from the Kubernetes Service Account has the permissions to access Athena or Redshift. See Enable IAM Roles for Service Accounts (IRSA) on the EKS cluster.

Test the data source connection

To confirm that you have correctly configured the connection details for the data source(s) in your configuration YAML file, use the test-connection command. If you wish, add a -V option to the command to returns results in verbose mode in the CLI.

soda test-connection -d my_datasource -c configuration.yml -V

Supported data types

Category
Data type

text

CHAR, VARCHAR, STRING

number

TINYINT, SMALLINT, INT, INTEGER, BIGINT, DOUBLE, FLOAT, DECIMAL

time

DATE, TIMESTAMP

Last updated

Was this helpful?