Onboard data sources & datasets

Before you can monitor, test, or enforce data contracts in Soda, you need to connect your data source and onboard your datasets.

Onboarding is the process of:

  • Connecting Soda to your data source (e.g., Snowflake, BigQuery, Databricks, PostgreSQL, etc.)

  • Discovering available datasets (tables or views)

  • Selecting which datasets to monitor and/or validate

  • Configuring observability, testing, and scheduling behavior

How you onboard datasets depends on how you deploy Soda.


Choose your onboarding path

Soda supports two main ways to onboard datasets:

Via Soda Cloud

This approach is ideal if you want:

  • No-code dataset onboarding

  • Automated dataset discovery

  • Built-in metric monitoring

  • Data contracts created in the UI

  • Centralized scheduling and alerting

This approach is ideal if you want:

  • Full control in code

  • Git-based workflows

  • CI/CD integration

  • Contract verification inside pipelines

  • Support for in-memory datasets

You can also combine both approaches: use Soda Cloud for centralized observability and governance, and Soda Core for pipeline-level validation.

Last updated

Was this helpful?