This page lists the supported data source types and their required connection parameters for use with Soda Core.
Soda uses the official Python drivers for each supported data source. The configuration examples below include the default required fields, but you can extend them with any additional parameters supported by the underlying driver.
Each data source configuration must be written in a YAML file and passed as an argument using the CLI or Python API.
General Guidelines
Each configuration must include type, name, and a connection block.
Use the exact structure required by the underlying Python driver.
Test the connection before using the configuration in a contract.
sodadata-sourcetest-dsds.yml
Connect to a data source already onboarded in Soda Cloud
If you have already onboarded a data source in Soda Cloud, make sure to use the exact same data source name in your Soda Core data source configuration. This ensures that datasets are correctly identified and mapped to the existing data source in Soda Cloud, whether you run verifications locally with Soda Core or remotely via a Soda Agent.
Onboard a data source programmatically
It is possible to onboard a data source to Soda Cloud (and to a Soda Agent) after onboarding it using Soda Core.
You can reference environment variables in your data source configuration. This is useful for securely managing sensitive values (like credentials) or dynamically setting parameters based on your environment (e.g., dev, staging, prod).
Example:
Environment variables must be available in the runtime environment where Soda is executed (e.g., your terminal, CI/CD runner, or Docker container).
For Soda to run quality scans on your data, you must configure it to connect to your data source.
To learn how to set up Soda from scratch and configure it to connect to your data sources, see Soda's Quickstart.