Link Search Menu Expand Document

Glossary

Last modified on 25-Apr-24

active check

Soda’s licensing model is based on the volume of active checks. An active check is one that Soda has executed during a scan at least once in the past 90 days. A single check, whether it has been executed during a scan one time, fifty times, or five hundred times in the last 90 days counts as an active check.

agreement

(Deprecating) A collection of checks that serve as a contract between stakeholders that stipulates the expected and agreed-upon state of data quality in a data source.

alert configuration

A configuration in a SodaCL check that you use to explicitly specify the conditions that warrant a warn result. See Optional check configurations.

built-in metric

An out-of-the-box metric that you can configure in a checks YAML file. See Metrics and checks.

check

A test for data quality that you write using the Soda Checks Language (SodaCL). Technically, it is a Python expression that checks metrics to see if they match the parameters you defined for a measurement. See Metrics and checks.

checks YAML

The file in which you define SodaCL checks. Soda Library uses the input from this file to prepare, then run SQL queries against your data. See How Soda works.

cloud metric store

The component in Soda Cloud that stores metric measurements. This component facilities the visualization of changes to your data over time.

collection

A saved set of filters in the Checks dashboard that you can access via a dropdown. Also known as a Saved View.

column

A column in a dataset in your data source.

configuration key

The key in the key-value pair that you use to define what qualifies as a missing or valid value in a column. A Soda scan uses the value of a column configuration key to determine if a check should pass, warn, or fail. For example, in valid format: UUID , valid format is a column configuration key and UUID is the only format of the data in the column that Soda considers valid. See Missing metrics and Validity metrics.

configuration YAML

The file in which you configure data source connection details and Soda Cloud connection details.

data source

A storage location that contains a collection of datasets, such as Snowflake, Amazon Athena, or GCP BigQuery.

dataset

A representation of a tabular data structure with rows and columns. A dataset can take the form of a table in PostgreSQL or Snowflake, a stream, or a DataFrame in a Spark application.

discussion

A collaborative messaging and check proposal space that data producers and consumers can use to establish agreed-upon rules for data quality. See: Begin a discussion and propose checks.

incident

A ticket you create and associate with a failed check result so as to track your team’s investigation and resolution of a data quality issue. See Create and track incidents.

measurement

The value for a metric that Soda Library collects during a scan.

metric

A property of the data in your dataset. See Metrics and checks.

monitor

(Deprecated) A set of details you define in Soda Cloud which Soda SQL used when it ran a scan. Now deprecated and replaced by a check.

no-code check

A SodaCL check you create via the Soda Cloud user interface.

notification

A setting you configure in a Soda Cloud agreement that defines whom to notify with check results after a scan.

recon YAML

The file in which you define SodaCL reconciliation checks. See Reconciliation checks.

scan

A command that executes checks to extract information about data in a data source. See Run a scan and view results.

scan definition

A collection of checks YAML files that contain the checks for data quality you wish to scan at a specific time, including details for which Soda Agent to use to connect to which data source. Effectively, a scan definition provides the what, when, and where to run a scan.

scan definition name

A unique identifier that you add to a programmatic scan or to the soda scan command using the -s option. Soda Cloud uses the scan definition name to correlate subsequent scan results, thus retaining an historical record of the measurements over time.

scan schedule

The schedule you customize in Soda Cloud to instruct a Soda Agent to execute scans at a regular cadence.

Soda Agent

The self-hosted or Soda-hosted Helm chart that faciliates a secure connection between your Soda Cloud account and your data sources. See Soda Agent basic concepts.

SodaCL

The domain-specific language to define Soda Checks in a checks YAML file. A Soda Check is a test that Soda Library executes when it scans a dataset in your data source.

Soda Cloud

A web application that enables you to examine scan results and create agreements. Create a Soda Cloud account at cloud.soda.io.

Soda Core

A free, open-source, Python library and command-line tool that enables you to use the Soda Checks Language to turn user-defined input into aggregated SQL queries that test for data quality. See Soda Core in GitHub.

Soda Library

A Python library and CLI tool that is a commercial extension of Soda Core. Connect Soda Library with over a dozen data sources and Soda Cloud, and use the Soda Checks Language to turn user-defined input into aggregated SQL queries that test for data quality.

Soda Spark (Deprecated)

Soda Spark was an extension of Soda SQL that allowed you to run Soda SQL functionality programmatically on a Spark DataFrame. It has been replaced by Soda Library configured to connect Soda to Apache Spark.

Soda SQL (Deprecated)

Soda SQL was an open-source command-line tool that scanned the data in your data source. Replaced by Soda Library.

threshold

The value for a metric that Soda checks against during a scan. See Metrics and checks.

validity rule

In Soda Cloud, the key-value pair that you use to define what qualifies as a missing valid value in a column. A Soda scan uses the value defined in a validity rule to determine if it should pass or fail a check. See also: configuration key.



Was this documentation helpful?

What could we do to improve this page?

Documentation always applies to the latest version of Soda products
Last modified on 25-Apr-24