Integrate Soda with dbt
Last modified on 20-Nov-24
Integrate Soda with dbt to access dbt test results from within your Soda Cloud account.
Use Soda Library to ingest the results of your dbt tests and push them to Soda Cloud so you can leverage features such as:
- visualizing your data quality over time
- setting up alert notifications for your team when dbt tests fail
- creating and tracking data quality incidents
✖️ Requires Soda Core Scientific (included in a Soda Agent)
✔️ Supported in Soda Core
✔️ Supported in Soda Library + Soda Cloud
✖️ Supported in Soda Cloud + Self-hosted Soda Agent
✖️ Supported in Soda Cloud + Soda-hosted Agent
Prerequisites
Videos
Ingest dbt test results from dbt-core into Soda Cloud
Ingest results from dbt Cloud into Soda Cloud
Ingestion notes and constraints
View dbt test results in Soda Cloud
Go further
Prerequisites
- You have installed a Soda Library package in your environment and have configured it to connect to a data source and your Soda Cloud account using a
configuration.yml
file. - You use dbt Cloud or dbt-core version 1.5 or 1.6. Note: As dbt no longer supports v1.4, Soda does not support that version.
Videos
Integrate dbt core with Soda.
Integrate dbt Cloud with Soda.
Ingest dbt test results from dbt-core into Soda Cloud
Every time you execute tests in dbt, dbt captures information about the test results. Soda Library can access this information and translate it into test results that Soda Cloud can display. You must first run your tests in dbt before Soda Library can find and translate test results, then push them to Soda Cloud.
- If you have not already done so, install one of the supported
soda-dbt
sub-packages in the Python environment that also runs your Soda Library package.pip install -i https://pypi.cloud.soda.io "soda-dbt[v15]" # OR pip install -i https://pypi.cloud.soda.io "soda-dbt[v16]"
- Run your dbt pipeline using one of the following commands:
- To ingest dbt test results, Soda Library uses the files that dbt generates when it builds or tests models:
manifest.json
andrun_results.json
. Use Soda Library to execute one of the following ingest commands to ingest the JSON files into Soda Cloud.- Specify the file path for the directory in which you store both the
manifest.json
andrun_results.json
files; Soda finds the files it needs in this directory.soda ingest dbt -d my_datasource_name --dbt-artifacts /path/to/files
OR
- Specify the path and filename for each individual JSON file that Soda Cloud must ingest.
soda ingest dbt -d my_datasource_name --dbt-manifest path/to/manifest.json --dbt-run-results path/to/run_results.json>
- Specify the file path for the directory in which you store both the
Run soda ingest --help
to review a list of all command options.
Ingest results from dbt Cloud into Soda Cloud
Every run that is part of a Job on dbt Cloud generates metadata about your dbt project as well as the results from the run. Use Soda Library to get this data directly from the dbt Cloud API.
Note that you must use Soda Library to run the CLI command to ingest dbt test results into Soda Cloud from dbt cloud. You cannot configure the connection to dbt Cloud from within the Soda Cloud user interface, as with a new data source, for example.
- If you have not already done so, install the
soda-dbt
sub-package in the Python environment that also runs you Soda Library package by running the following command.pip install -i https://pypi.cloud.soda.io soda-dbt
- Obtain a dbt Cloud Admin API Service Token.
- Add the following configuration in your Soda
configuration.yml
file as in the following example. Look for the account ID after the wordaccount
in a dbt Cloud URL. For example,https://cloud.getdbt.com/#/accounts/840923545***/
or navigate to your dbtCloud Account Settings page.dbt_cloud: account_id: account_id api_token: serviceAccountTokenFromDbt1234
Note that as of March 1, 2024, dbtCloud users must use region-specific access URLs for API connections. Because the Soda integration with dbtCloud interacts with dbt’s admin API, users may have to specify the base URL of the admin api via the
access_url
property, as in the example below. Find your access URL in your dbtCloud account in Account Settings. If you do not provide this in your configuration, Soda defaults to"cloud.getdbt.com"
. Find out more in Access, Regions & IP Addresses.dbt_cloud: account_id: account_id api_token: serviceAccountTokenFromDbt1234 access_url: ab123.us1.dbt.com
- From the command-line, run the
soda ingest
command to capture the test results from dbt Cloud and send them to Soda Cloud and include one of two identifiers from dbt Cloud. Refer to dbt Cloud documentation for more information.- Use the run ID from which you want Soda to ingest results.
Look for the run ID at the top of any Run page “Run #40732579” in dbt Cloud, or in the URL of the Run page. For example,https://cloud.getdbt.com/#/accounts/ 1234/projects/1234/runs/40732579/
soda ingest dbt -d my_datasource_name -c configuration.yml --dbt-cloud-run-id the_run_id
OR
- Use the job ID from which you want Soda to ingest results. Using the job ID enables you to write the command once, and and know that Soda always ingests the latest run of the job, which is ideal if you perform ingests on a regular schedule via a cron job or other scheduler.
Look for the job ID after the word “jobs” in the URL of the Job page in dbt Cloud. For example,https://cloud.getdbt.com/#/accounts/ 1234/projects/5678/jobs/123445/
soda ingest dbt -d my_datasource_name -c configuration.yml --dbt-cloud-job-id the_job_id
- Use the run ID from which you want Soda to ingest results.
Ingestion notes and constraints
- When you call the ingestion integration, Soda Library reads the information from
manifest.json
andrun_results.json
files (or gets them from the dbt Cloud API), then maps the information onto the corresponding datasets in Soda Cloud. If the mapping fails, Soda Library creates a new dataset and Soda Cloud displays the dbt monitor results associated with the new dataset. - In Soda Cloud, the displayed scan time of a dbt test is the time that Soda Library ingested the test result from dbt. The scan time in Soda Cloud does not represent the time that the dbt pipeline executed the test. If you want those times to be close to each other, we recommend running a
soda ingest
right after your dbt transformation or testing pipeline has completed. - The command
soda scan
cannot trigger a dbt run, and the commanddbt run
cannot trigger a Soda scan. You must execute Soda scans and dbt runs individually, then ingest the results from adbt run
into Soda by explicitly executing asoda ingest
command. - Soda can ingest dbt tests that:
- have test metadata (
test_metadata
in the test node json) - have a run result
- have test metadata (
View dbt test results in Soda Cloud
After completing the steps above to ingest dbt tests, log in to your Soda Cloud account, then navigate to the Checks dashboard.
Each row in the table of Check represents a check that Soda Library executed, or a dbt test that Soda Library ingested. dbt tests are prefixed with dbt:
in the table of Checks.
- Click the row of a dbt test to examine visualized historic data for the test, details of the results, and information that can help you diagnose a data quality issue.
- Click the stacked dots at the far right of a dbt check, then select Create Incident to begin investigating a data quality issue with your team.
- Set up an alert notification rule for checks with fail or warn results. Navigate to your avatar > Notification Rules, then click New Notification Rule. Follow the guided steps to complete the new rule. Send notifications to an individual or a team in Slack.
Go further
- Learn more about How Soda works.
- Read more about running a Soda scan.
- As a business user, learn how to create no-code checks in Soda Cloud.
- Learn more about creating, tracking, and resolving data quality incidents in Soda Cloud.
- Need help? Join the Soda community on Slack.
- Access a list of all integrations that Soda Cloud supports.
Was this documentation helpful?
What could we do to improve this page?
- Suggest a docs change in GitHub.
- Share feedback in the Soda community on Slack.
Documentation always applies to the latest version of Soda products
Last modified on 20-Nov-24