Link Search Menu Expand Document

Integrate Soda Core with dbt

Last modified on 06-Dec-22

Integrate Soda with dbt to access dbt test results from within your Soda Cloud account.

Use Soda Core to ingest the results of your dbt tests and push them to Soda Cloud so you can leverage features such as:

  • visualizing your data quality over time
  • setting up alert notifications for your team when dbt tests fail
  • creating and tracking data quality incidents

Prerequisites
Videos
Ingest dbt test results from dbt-core into Soda Cloud
Ingest results from dbt Cloud into Soda Cloud
Ingestion notes and constraints
View dbt test results in Soda Cloud
Go further


Prerequisites

  • You have created a Soda Cloud account with Admin, Manager, or Editor permissions.
  • You have installed a Soda Core package in your environment and [configured it]](/soda-core/configuration.html) to connect to a data source using a configuration.yml file.
  • You have connected your Soda Cloud account to Soda Core.
  • You use the open-source dbt-core version 1.0.0 or later or dbt Cloud.
  • You have installed the optional soda-core-dbt sub-package in the Python environment that also runs soda-core by running pip install soda-core-dbt.

Videos

Integrate dbt core with Soda.


Integrate dbt Cloud with Soda.

Ingest dbt test results from dbt-core into Soda Cloud

Every time you execute tests in dbt, dbt captures information about the test results. Soda Core can access this information and translate it into test results that Soda Cloud can display. You must first run your tests in dbt before Soda Core can find and translate test results, then push them to Soda Cloud.

  1. If you have not already done so, install the soda-core-dbt sub-package in the Python environment that also runs soda-core by running pip install soda-core-dbt.
  2. Run your dbt pipeline using one of the following commands:
  3. To ingest dbt test results, Soda Core uses the files that dbt generates when it builds or tests models: manifest.json and run_results.json. Use Soda Core to execute one of the following ingest commands to ingest the JSON files into Soda Cloud.
    • Specify the file path for the directory in which you store both the manifest.json and run_results.json files; Soda finds the files it needs in this directory.
      soda ingest dbt -d my_datasource_name --dbt-artifacts /path/to/files
      

      OR

    • Specify the path and filename for each individual JSON file that Soda Cloud must ingest.
      soda ingest dbt -d my_datasource_name --dbt-manifest path/to/manifest.json --dbt-run-results path/to/run_results.json>
      

Run soda ingest --help to review a list of all command options.


Ingest results from dbt Cloud into Soda Cloud

Every run that is part of a Job on dbt Cloud generates metadata about your dbt project as well as the results from the run. Use Soda Core to get this data directly from the dbt Cloud API.

  1. If you have not already done so, install the soda-core-dbt sub-package in the Python environment that also runs soda-core by running pip install soda-core-dbt.
  2. Obtain a dbt Cloud Admin API Service Token.
  3. Add the following configuration in your Soda configuration.yml file as in the following example. Look for the account ID after the word “account” in a dbt Cloud URL. For example, https://cloud.getdbt.com/#/accounts/840923545***/
    dbt_cloud:
      account_id: account_id
      api_token: serviceAccountTokenFromDbt1234
    
  4. From the command-line, run the soda ingest command to capture the test results from dbt Cloud and send them to Soda Cloud and include one of two identifiers from dbt Cloud. Refer to dbt Cloud documentation for more information.
    • Use the run ID from which you want Soda to ingest results.
      Look for the run ID at the top of any Run page “Run #40732579” in dbt Cloud, or in the URL of the Run page. For example, https://cloud.getdbt.com/#/accounts/ 1234/projects/1234/runs/40732579/
      soda ingest dbt -d my_datasource_name -c configuration.yml --dbt-cloud-run-id the_run_id
      

      OR

    • Use the job ID from which you want Soda to ingest results. Using the job ID enables you to write the command once, and and know that Soda always ingests the latest run of the job, which is ideal if you perform ingests on a regular schedule via a cron job or other scheduler.
      Look for the job ID after the word “jobs” in the URL of the Job page in dbt Cloud. For example, https://cloud.getdbt.com/#/accounts/ 1234/projects/5678/jobs/123445/
      soda ingest dbt -d my_datasource_name -c configuration.yml --dbt-cloud-job-id the_job_id
      


Ingestion notes and constraints

  • When you call the ingestion integration, Soda Core reads the information from manifest.json and run_results.json files (or gets them from the dbt Cloud API), then maps the information onto the corresponding datasets in Soda Cloud. If the mapping fails, Soda Core creates a new dataset and Soda Cloud displays the dbt monitor results associated with the new dataset.
  • In Soda Cloud, the displayed scan time of a dbt test is the time that Soda Core ingested the test result from dbt. The scan time in Soda Cloud does not represent the time that the dbt pipeline executed the test. If you want those times to be close to each other, we recommend running a soda ingest right after your dbt transformation or testing pipeline has completed.
  • The command soda scan cannot trigger a dbt run, and the command dbt run cannot trigger a Soda scan. You must execute Soda scans and dbt runs individually, then ingest the results from a dbt run into Soda by explicitly executing a soda ingest command.

View dbt test results in Soda Cloud

After completing the steps above to ingest dbt tests, log in to your Soda Cloud account, then navigate to the Check Results dashboard.

Each row in the table of Check Results represents the result of a check that Soda Core executed,or the result of a dbt test that Soda Core ingested. dbt test results are prefixed with dbt: in the table of Check Results.

  • Click the row of a dbt check result to examine visualized historic data for the test, details of the results, and information that can help you diagnose a data quality issue.
  • Click the stacked dots at the far right of a dbt check result, then select Create Incident to begin investigating a data quality issue with your team.
  • Click the stacked dots at the far right of a dbt check result, then select Edit Check to set up a notification that Soda Cloud sends when the dbt test fails. Send notifications to an individual or a team in Slack.

Go further


Was this documentation helpful?

What could we do to improve this page?


Last modified on 06-Dec-22