Link Search Menu Expand Document

Soda Cloud resources

Soda Cloud is made up of several parts, or resources, that work together to define checks, execute scans, and display results that help you gauge the quality and reliability of your data.

It is helpful to understand these resources and how they relate, or connect, to each other if you are establishing role-based access rules for your organization’s Soda Cloud account, or if you are planning to delete an existing resource.

Example Soda Cloud deployment
Delete resources
Example deployment with Soda Cloud and Soda Core

Example Soda Cloud deployment

The following diagram illustrates an example deployment of a single Soda Cloud account with two Soda Agents, each of which connects to two data sources. A Soda Cloud Administrator has also created integrations with Slack, Jira (via a webhook), and MS Teams (coming soon).

example-deployment

  • A Soda Agent is Soda Core that has been deployed in Kubernetes cluster in a cloud services provider environment. It enables Soda Cloud users to securely connect to data sources such as Snowflake, BigQuery, and PostgreSQL. Read more.
  • A data source in Soda Cloud is a representation of the connection to your data source. Notably, it does not contain any of your data, only data source metadata that it uses to check for data quality. Read more.
    Within the context of Soda Cloud, a data source contains:
    • datasets which represent tabular structures with rows and columns in your data source; like data sources, they do not contain your data, only metadata. Datasets can contain user-defined attributes that help filter and organize check results. Read more.
    • scan definitions which you use to define a Soda scan schedule for the data source. Read more.
    • agreements in which you write checks to define what good data looks like. Agreements also specify where to send alert notifications when a check result warns or fails, such as to a Slack channel in your organization. Read more.
  • An integration is a built-in Soda Cloud feature that enables you to connect with a third-party service provider, such as Slack. Read more.

An exception to this rule exists when you configure Soda Cloud to collect sample data from a dataset, or samples of failed rows from a dataset when a check result fails.

Delete resources

As the example deployment diagram illustrates, the different resources in Soda Cloud have several connections to each other. You can responsibly delete resources in Soda Cloud – it warns you about the relevant impact before executing a deletion! – but it may help to visualize the impact a deletion may have on your deployment before proceeding.

The following non-exhaustive list of example deletions serve to illustrate the potential impact of deleting.

Delete a dataset

Deleting a dataset affects individual checks defined inside an agreement. If you have multiple agreements which contain checks against a particular dataset, all of those checks, and consequently the agreements they are in, are impacted when you delete a dataset. Further, if the dataset contains attributes, those attributes disappear with the dataset upon deletion.

delete-dataset

Delete a data source

Deleting a data source affects many other resources in Soda Cloud. As the following diagram illustrates, when you delete a data source, you delete all its datasets, scan definitions, agreements, and the checks in the agreements.

If an agreement contains a cross check that compares the row count of datasets between data sources (as does the agreement in Data source C in the diagram), deleting a data source affects more than the checks and agreements it contains.

delete-datasource

Delete a scan definition

Deleting a scan definition has the potential to impact multiple agreements in a data source. Among other things, the scan definition defines the schedule that Soda Cloud uses to execute scans of data in the data source.

Any agreements that reference a deleted scan definition would no longer be scanned for data quality. Consequently, your Check Results dashboard in Soda Cloud no longer displays check results for the agreement, nor would Soda Cloud send alert notifications.

delete-scan-def

Delete an integration

A Soda Cloud Administrator has the ability to add, edit, and delete integrations with third-party service providers.

As the example diagram indicates, deleting a Slack integration prevents Soda Cloud from sending alert notifications to Slack when check results warn or fail, and prevents users from connecting an incident to a Slack channel to collaborate on data quality issue resolution.

delete-integration

Example deployment with Soda Cloud and Soda Core

If your Soda Cloud account is also connected to Soda Core, your deployment may resemble something like the following diagram.

Note that you can delete resources that appear in Soda Cloud as a result of a manual or programmatic Soda Core scan. However, unless you delete the reference to the resource at its source – the checks.yml file or configuration.yml file – the resource will reappear in Soda Cloud when Soda Core sends its next set of scan results.

For example, imagine you use Soda Core to run scans and send results to Soda Cloud. In the checks.yml file that you use to define your checks, you have the following configuration:

checks for dataset-q:
  - missing_count(last_name) < 10

In Soda Cloud, you can see dataset-q because Soda Core pushed the scan results to Soda Cloud which resulted in the creation of a resource for that dataset. In Soda Cloud, you can use the UI to delete dataset-q, but unless you also remove the checks for dataset-q configuration from your checks.yml file, the dataset reappears in Soda Cloud the next time you run a scan.

example-cloud-with-core

Go further


Was this documentation helpful?

What could we do to improve this page?


Last modified on 30-Sep-22