A setting that you configure in a Soda Cloud monitor by specifying key:value thresholds which, if exceeded, trigger a notification. See also: notification.
A Soda SQL CLI command that sifts through the contents of your data source and automatically prepares a scan YAML file for each table. See Create a scan YAML file.
An out-of-the-box metric that you can configure in a scan YAML file. There are two kinds of built-in metrics: dataset, which are metrics that apply to an entire dataset, and column, which are metrics that apply to individual columns in a dataset.
By contrast, you can use custom metrics, also known as SQL metrics, to define SQL queries in a scan YAML file in Soda SQL.
A column in a dataset in your data source.
The key in the key-value pair that you use to define what qualifies as a valid value in a column. A Soda scan uses the value of a column configuration key to determine if it should pass or fail a test. For example, in
valid_format: UUID ,
valid_format is a column configuration key and
UUID is the only format of the data in the column that Soda considers valid. See Column configuration keys.
A property of the data of a single column in your data source. Use a column metric to define tests that apply to specific columns in a dataset during a scan. See Column metrics.
The key in the key-value pair that you use to define configuration in your scan YAML file. See Scan YAML configuration keys.
A Soda SQL CLI command that creates a warehouse directory.
A metric you define in your scan YAML file using SQL queries. Also known as a SQL metric. SQL metrics essentially enable you to add SQL queries to your scan YAML file so that Soda SQL runs them during a scan. See SQL metrics.
A storage location that contains a collection of datasets. A warehouse in Soda SQL is one form of datasource. A datasource may also imply a compute engine that Soda SQL uses to compute measurements.
A representation of a tabular data structure with rows and columns. A dataset can take the form of a table in PostgreSQL or Snowflake, a stream in Kafka, or a dataframe in a Spark application.
A property of the data in a dataset in your data source. Use a dataset metric to define tests that apply to all the columns in the dataset during a scan. See Dataset metrics.
See built-in metric.
The file in your local user home directory that stores your data source login credentials.
The value for a metric that Soda SQL checks against during a scan. For example, in the test
row_count = 5,
row_count is the metric and
5 is the measurement.
A property of the data in your dataset. See Metrics.
The component in Soda Cloud that stores metric measurements. This component facilities the visualization of changes to your data over time.
A set of details you define in Soda Cloud which Soda SQL uses when it runs a scan. Sometimes referred to in other systems as a “data quality rule”.
For a new monitor, you define: a dataset and column against which to execute a test, a test, an alert, a notification, an owner, and a description. See Create monitors and alerts.
A setting you configure in a Soda Cloud monitor that defines whom to notify when a data issue triggers an alert. See also: alert.
A command that executes tests to extract information about data in a data source. See Run a scan.
The file in which you configure scan metrics and tests. Soda SQL uses the input from this file to prepare, then run SQL queries against your data. See Scan YAML.
A free, web application that enables you to examine scan results and create monitors and alerts. Create a Soda Cloud account at cloud.soda.io. If you also use Soda SQL, you can connect Soda SQL to Soda Cloud.
An open-source command-line tool that scans the data in your data source. You can use this as a stand-alone tool to monitor data quality from the command-line, or connect it to a Soda Cloud account to monitor your data using a web application. Start by installing Soda SQL.
See custom metric.
A type of dataset.
See dataset metric.
A Python expression that, during a scan, checks metrics to see if they match the parameters you defined for a measurement. As a result of a scan, a test either passes or fails. See Tests.
In Soda Cloud, the key-value pair that you use to define what qualifies as a valid value in a column. A Soda scan uses the value defined in a validity rule to determine if it should pass or fail a test. See also: column configuration key.
A type of data source.
The top directory in the Soda SQL directory structure which contains your warehouse YAML file and, generally, your
Last modified on 15-Sep-21