Double-onboard a data source

Learn how to double-onboard a data source to leverage all the features supported by Soda Agents.

To scan your data for quality, Soda must connect to a data source using connection configurations (host, port, login credentials, etc.) that you either define in Soda Cloud during onboarding using a Soda Agent, or in a configuration YAML file you reference during programmatic or CLI scans using Soda Library. Soda recognizes each data source you onboard as an independent resource in Soda Cloud, where it displays all scan results and failed row samples for all data sources regardless of onboarding method.

However, data sources you connect via a Soda agent using the guided workflow in Soda Cloud support several features which data sources you connect via Soda Library do not, including:

no-code checks
Discussions
scan scheduling
anomaly dashboards Available in 2025

If you have onboarded a data source via Soda Library but you wish to take advantage of the features available to Soda Agent-onboarded data sources, you can double-onboard an existing data source.

See also: Soda overview
See also: Choose a flavor of Soda
See also: Add a new data source in Soda Cloud

Prerequisites

You installed Soda Library, you have configured it to connect to your data source, and you have run at least one scan programmatically or via the Soda Library CLI.
You have deployed a self-hosted Soda Agent helm chart in a Kubernetes cluster in your cloud services environment OR Someone with Soda Admin privileges in your organization’s Soda Cloud account has navigated to your avatar > Organization Settings check the box to Enable Soda-hosted Agent; see Set up a Soda-hosted agent.
You have access to the connection configurations (host, port, login credentials, etc.) for your data source.
Your data source is compatible with a Soda Agent; refer to tables below.

Self-hosted agent

Amazon Athena Amazon Redshift Azure Synapse ClickHouse Databricks SQL Denodo Dremio DuckDB GCP BigQuery Google CloudSQL

IBM DB2 MotherDuck MS SQL Server¹ MySQL OracleDB PostgreSQL Presto Snowflake Trino Vertica

¹ MS SQL Server with Windows Authentication does not work with Soda Agent out-of-the-box.

Soda-hosted agent

BigQuery Databricks SQL MS SQL Server MySQL

PostgreSQL Redshift Snowflake

Onboard an existing data source

Log in to Soda Cloud, then navigate to your avatar > Data Sources.
From the list of data sources connected to your Soda Cloud account, click to select and open the one you onboarded via Soda Library and now wish to double-onboard via a Soda Agent.
Follow the guided workflow to onboard the existing data source via a Soda Agent, starting by using the dropdown to select the Default Scan Agent you wish to use to connect to the data source.
Complete the guided steps to:

define a schedule for your default scan definition
provide connection configuration details for the data source such as name, schema, and login credentials, and test the connection to the data source
profile the datasets in the data source to gather basic metadata about the contents of each
identify the datasets to which you wish to apply automated monitoring for anomalies and schema changes
assign ownership roles for the data source and its datasets

Save your changes, then navigate to the Datasets page and select a dataset in the data source you just double-onboarded.
(Optional) If you wish, and if you have requested preview access for the feature, you can follow the instructions to activate the anomaly dashboard for the dataset.
(Optional) Click Add Check and begin adding no-code checks to the dataset.

Known issue: Double-onboarding a data source renders Soda Library API keys invalid. After double-onboarding a data source, if you run a programmatic or CLI scan of that data source using Soda Library, an error appears to indicate that the API keys are invalid. As a workaround, generate new API keys in Soda Cloud, then, in your configuration YAML, replace the old API key values with the newly-generated ones.

Go further

Learn more about automating anomaly detection for observability.

Need help? Join the Soda community on Slack.

PreviousReroute failed row samples NextWrite SodaCL checks

Last updated 1 month ago

Was this helpful?