Onboard data sources & datasets via Soda Core

Go to the data source reference for Soda Core for more in-depth details about supported data sources and connection parameters.

A data source can be partially onboarded programmatically via Soda Core. Once a data source is fully onboarded via Soda Cloud, the engineering team can onboard datasets programmatically.

In a nutshell, a dataset is pushed onto Soda Cloud by:

Creating a contract YAML that points to that dataset
Pushing the results of the contract onto Soda Cloud

Soda does not support pushing a full data source configuration programmatically. This flow will create an empty data source that needs to be configured later in Soda Cloud.

Partially onboard a data source via CLI

This flow shows how to partially onboard a data source programmatically in order to finish setting it up in Soda Cloud.

1. Set up your connections

Follow the CLI reference to:

2. Create a contract

Run the following:
```
soda contract create --dataset datasource/db/schema/table --file contract.yaml --data-source ds_config.yml --soda-cloud sc_config.yml
```
Unlike the create command in the CLI reference, to push a data source via CLI you must not use the --use-agent flag.
The --use-agent flag would attempt to find the data source in Soda Cloud, which has not been yet set up in your environment at this time.

When should I use the `--use-agent` flag?

In some organizations, the data source Admin can create the data source connection in Soda Cloud without onboarding any datasets. Then, engineers can create and publish new contracts to onboard datasets via CLI. Since the data source exists already in Soda Cloud, the --use-agent flag should be used so that the datasets are pushed from Soda Core onto the existing data source through the Agent.

3. Test, verify & publish the contract

Test that the contract is correct

soda contract test --contract contract.yaml

Verify the contract

soda contract verify --contract contract.yaml --data-source ds_config.yml

Publish the contract

soda contract publish --contract contract.yaml --soda-cloud sc_config.yml

After successfully publishing the contract, you will

  __|  _ \|  \   \\
\__ \ (   |   | _ \\
____/\___/___/_/  _\\ CLI v4.0.4b24
Fetching datasets configurations from Soda Cloud for datasets '[DatasetIdentifier(data_source='CLI_testing', prefixes=['postgres', 'aldi_local'], dataset='retail_orders')]'
Verifying contract 📜 contract.yaml 🤞

### Contract results for CLI_testing/postgres/aldi_local/retail_orders
+-----------------+-----------------------------------+-------------------------------+-----------+---------------+   
| Column          | Check                             | Threshold                     | Outcome   | Diagnostics   |   
+=================+===================================+===============================+===========+===============+   
| [dataset-level] | Schema matches expected structure | level: fail                   | ✅ PASSED |               |   
|                 |                                   | must be less than or equal: 0 |           |               |   
+-----------------+-----------------------------------+-------------------------------+-----------+---------------+   
# Summary:
|----------------|---|----|
| Checks         | 1 |    |
| Passed         | 1 | ✅ |
| Failed         | 0 | ✅ |
| Warned         | 0 | ✅ |
| Not Evaluated  | 0 | ✅ |
| Excluded       | 0 | ✅ |
| Runtime Errors | 0 | ✅ |

👌 Results sent to Soda Cloud
To view the dataset on Soda Cloud, see https://cloud.us.soda.io/o/<datasetID>/datasets/ab1bc55d-c49a-441e-a0dc-c01857c71b21
Updating post processing stage 'diagnosticWarehouse' to state 'completed' for scan <scanID>
Updated post processing stage 'diagnosticWarehouse' to state 'completed' for scan <scanID>

Complete onboarding via Soda Cloud

A data source must be connected to Soda Agent to access contract verification in Soda Cloud. Connection to Soda Agent is performed via Soda Cloud.

If you attempt to use contract verification, you will find the following warning:
If you attempt to access Metric Monitoring capabilities in Soda Cloud, you will find the following warning:

In order to finish the onboarding, complete the connection in Soda Cloud.

You are not logged in to Soda and are viewing the default public documentation. Learn more about Documentation access & licensing.

PreviousAdditional settings NextManage data quality issues

Last updated 10 days ago

Was this helpful?

hashtagPartially onboard a data source via CLI

hashtag1. Set up your connections

hashtag2. Create a contract

hashtagWhen should I use the --use-agent flag?

hashtag3. Test, verify & publish the contract

hashtagComplete onboarding via Soda Cloud

Partially onboard a data source via CLI

1. Set up your connections

2. Create a contract

When should I use the `--use-agent` flag?

3. Test, verify & publish the contract

Complete onboarding via Soda Cloud