Link Search Menu Expand Document

Glossary

alert

A setting that you configure in a Soda Cloud monitor by specifying key:value thresholds which, if exceeded, trigger a notification. See also: notification.

analyze

A Soda SQL CLI command that sifts through the contents of your data source and automatically prepares a scan YAML file for each table. See Create a scan YAML file.

built-in metric

An out-of-the-box metric that you can configure in a scan YAML file. There are two kinds of built-in metrics: dataset, which are metrics that apply to an entire dataset, and column, which are metrics that apply to individual columns in a dataset.

By contrast, you can use custom metrics, also known as SQL metrics, to define SQL queries in a scan YAML file in Soda SQL.

column

A column in a dataset in your data source.

column configuration key

The key in the key-value pair that you use to define what qualifies as a valid value in a column. A Soda scan uses the value of a column configuration key to determine if it should pass or fail a test. For example, in valid_format: UUID , valid_format is a column configuration key and UUID is the only format of the data in the column that Soda considers valid. See Column configuration keys.

column metric

A property of the data of a single column in your data source. Use a column metric to define tests that apply to specific columns in a dataset during a scan. See Column metrics.

configuration key

The key in the key-value pair that you use to define configuration in your scan YAML file. See Scan YAML configuration keys.

create

A Soda SQL CLI command that creates a warehouse directory.

custom metric

A metric you define in your scan YAML file using SQL queries. Also known as a SQL metric. SQL metrics essentially enable you to add SQL queries to your scan YAML file so that Soda SQL runs them during a scan. See SQL metrics.

data source

A storage location that contains a collection of datasets. A warehouse in Soda SQL is one form of datasource. A datasource may also imply a compute engine that Soda SQL uses to compute measurements.

dataset

A representation of a tabular data structure with rows and columns. A dataset can take the form of a table in PostgreSQL or Snowflake, a stream in Kafka, or a dataframe in a Spark application.

dataset metric

A property of the data in a dataset in your data source. Use a dataset metric to define tests that apply to all the columns in the dataset during a scan. See Dataset metrics.

default metric

See built-in metric.

env_vars YAML

The file in your local user home directory that stores your data source login credentials.

measurement

The value for a metric that Soda SQL checks against during a scan. For example, in the test row_count = 5, row_count is the metric and 5 is the measurement.

metric

A property of the data in your dataset. See Metrics.

metric store

The component in Soda Cloud that stores metric measurements. This component facilities the visualization of changes to your data over time.

monitor

A set of details you define in Soda Cloud which Soda SQL uses when it runs a scan. Sometimes referred to in other systems as a “data quality rule”.
For a new monitor, you define: a dataset and column against which to execute a test, a test, an alert, a notification, an owner, and a description. See Create monitors and alerts.

notification

A setting you configure in a Soda Cloud monitor that defines whom to notify when a data issue triggers an alert. See also: alert.

scan

A command that executes tests to extract information about data in a data source. See Run a scan.

scan YAML

The file in which you configure scan metrics and tests. Soda SQL uses the input from this file to prepare, then run SQL queries against your data. See Scan YAML.

Soda Cloud

A free, web application that enables you to examine scan results and create monitors and alerts. Create a Soda Cloud account at cloud.soda.io. If you also use Soda SQL, you can connect Soda SQL to Soda Cloud.

Soda SQL

An open-source command-line tool that scans the data in your data source. You can use this as a stand-alone tool to monitor data quality from the command-line, or connect it to a Soda Cloud account to monitor your data using a web application. Start by installing Soda SQL.

SQL metric

See custom metric.

table

A type of dataset.

table metric

See dataset metric.

test

A Python expression that, during a scan, checks metrics to see if they match the parameters you defined for a measurement. As a result of a scan, a test either passes or fails. See Tests.

validity rule

In Soda Cloud, the key-value pair that you use to define what qualifies as a valid value in a column. A Soda scan uses the value defined in a validity rule to determine if it should pass or fail a test. See also: column configuration key.

warehouse

A type of data source.

warehouse directory

The top directory in the Soda SQL directory structure which contains your warehouse YAML file and, generally, your /tables directory.

warehouse YAML

The file in which you configure data source connection details and Soda Cloud connection details. See Warehouse YAML and Connect to Soda Cloud.



Last modified on 15-Sep-21

Was this documentation helpful?
Give us your feedback in the #soda-docs channel in the Soda community on Slack or open an issue in GitHub.