Onboard data sources & datasets
Before you can monitor, test, or enforce data contracts in Soda, you need to connect your data source and onboard your datasets.
Onboarding is the process of:
Connecting Soda to your data source (e.g., Snowflake, BigQuery, Databricks, PostgreSQL, etc.)
Discovering available datasets (tables or views)
Selecting which datasets to monitor and/or validate
Configuring observability, testing, and scheduling behavior
How you onboard datasets depends on how you deploy Soda.
Choose your onboarding path
Soda supports two main ways to onboard datasets:
This approach is ideal if you want:
No-code dataset onboarding
Automated dataset discovery
Built-in metric monitoring
Data contracts created in the UI
Centralized scheduling and alerting
This approach is ideal if you want:
Full control in code
Git-based workflows
CI/CD integration
Contract verification inside pipelines
Support for in-memory datasets
You can also combine both approaches: use Soda Cloud for centralized observability and governance, and Soda Core for pipeline-level validation.
Last updated
Was this helpful?
