Link Search Menu Expand Document

Glossary

Last modified on 31-May-23

agreement

A collection of checks that serve as a contract between stakeholders that stipulates the expected and agreed-upon state of data quality in a data source.

alert configuration

A configuration in a SodaCL check that you use to explicitly specify the conditions that warrant a warn result. See Optional check configurations.

built-in metric

An out-of-the-box metric that you can configure in a checks YAML file. See Metrics and checks.

check

A test for data quality that you write using the Soda Checks Language (SodaCL). See Metrics and checks.

checks YAML

The file in which you define SodaCL checks. Soda Core uses the input from this file to prepare, then run SQL queries against your data. See How Soda Core works.

cloud metric store

The place in Soda Cloud that stores the values of measurements collected over time as Soda Core executes checks.

column

A column in a dataset in your data source.

configuration key

The key in the key-value pair that you use to define what qualifies as a missing or valid value in a column. A Soda scan uses the value of a column configuration key to determine if a check should pass, warn, or fail. For example, in valid format: UUID , valid format is a column configuration key and UUID is the only format of the data in the column that Soda considers valid. See Missing metrics and Validity metrics.

configuration YAML

The file in which you configure data source connection details and Soda Cloud connection details. See How Soda Core works.

data source

A storage location that contains a collection of datasets, such as Snowflake, Amazon Athena, or GCP BigQuery.

dataset

A representation of a tabular data structure with rows and columns. A dataset can take the form of a table in PostgreSQL or Snowflake, a stream, or a DataFrame in a Spark application.

incident

A ticket you create and associate with a failed check result so as to track your team’s investigation and resolution of a data quality issue. See Create and track incidents.

measurement

The value for a metric that Soda Core collects during a scan.

metric

A property of the data in your dataset. See Metrics and checks.

metric store

The component in Soda Cloud that stores metric measurements. This component facilities the visualization of changes to your data over time.

monitor

(Deprecated) A set of details you define in Soda Cloud which Soda SQL used when it ran a scan. Now deprecated and replaced by a check.

notification

A setting you configure in a Soda Cloud agreement that defines whom to notify with check results after a scan.

scan

A command that executes checks to extract information about data in a data source. See Run a Soda Core scan.

scan definition

A collection of checks YAML files that contain the checks for data quality you wish to scan at a specific time, including details for which Soda Agent to use to connect to which data source. Effectively, a scan definition provides the what, when, and where to run a scan.

scan definition name

A unique identifier that you add to a programmatic scan or to the soda scan command using the -s option. Soda Cloud uses the scan definition name to correlate subsequent scan results, thus retaining an historical record of the measurements over time.

Soda Agent

The Helm chart you deploy in your Kubernetes cluster to faciliate a secure connection between your Soda Cloud account and your data sources. See Soda Agent basic concepts.

SodaCL

The domain-specific language to define Soda Checks in a checks YAML file. A Soda Check is a test that Soda Core executes when it scans a dataset in your data source. See SodaCL documentation.

Soda Cloud

A web application that enables you to examine scan results and create agreements. Create a Soda Cloud account at cloud.soda.io. If you also use Soda Core, you can connect Soda Core to Soda Cloud.

Soda Core

A free, open-source, command-line tool that enables you to use the Soda Checks Language to turn user-defined input into aggregated SQL queries. You can use this as a stand-alone tool to monitor data quality from the command-line, or connect it to a Soda Cloud account to monitor your data using a web application. See Soda Core documentation.

Soda Spark (Deprecated)

Soda Spark was an extension of Soda SQL that allowed you to run Soda SQL functionality programmatically on a Spark DataFrame. It has been replaced by Soda Core configured to connect Soda to Apache Spark.

Soda SQL (Deprecated)

Soda SQL was an open-source command-line tool that scanned the data in your data source. Replaced by Soda Core.

threshold

The value for a metric that Soda checks against during a scan. See Metrics and checks.

validity rule

In Soda Cloud, the key-value pair that you use to define what qualifies as a missing valid value in a column. A Soda scan uses the value defined in a validity rule to determine if it should pass or fail a check. See also: configuration key.



Was this documentation helpful?

What could we do to improve this page?

Last modified on 31-May-23