Write SodaCL checks

Soda Checks Language is a human-readable, domain-specific language for data reliability. You use SodaCL to define Soda Checks in a checks YAML file.

Soda Checks Language (SodaCL) is a YAML-based, domain-specific language for data reliability. Used in conjunction with Soda tools, you use SodaCL to write checks for data quality, then run a scan of the data in your data source to execute those checks. A Soda Check is a test that Soda performs when it scans a dataset in your data source.

A Soda scan executes the checks you write in an agreement, in a checks YAML file, or inline in a programmatic invocation, and returns a result for each check: pass, fail, or error. Optionally, you can configure a check to warn instead of fail by setting an alert configuration.

As a step in the Get started roadmap, this guide offers instructions to define your first SodaCL checks in the Soda Cloud UI as no-code checks or in agreements, in a checks YAML file, or within a programmatic invocation of Soda.

Get started roadmap

  1. Choose a flavor of Soda

  2. Set up Soda: install, deploy, or invoke

  3. Write SodaCL checks 📍 You are here!

  4. Run scans and review results

  5. Organize, alert, investigate


Examples

# Checks for basic validations
checks for dim_customer:
  - row_count between 10 and 1000
  - missing_count(birth_date) = 0
  - invalid_percent(phone) < 1 %:
      valid format: phone number
  - invalid_count(number_cars_owned) = 0:
      valid min: 1
      valid max: 6
  - duplicate_count(phone) = 0

checks for dim_product:
  - avg(safety_stock_level) > 50
# Checks for schema changes
  - schema:
      name: Find forbidden, missing, or wrong type
      warn:
        when required column missing: [dealer_price, list_price]
        when forbidden column present: [credit_card]
        when wrong column type:
          standard_cost: money
      fail:
        when forbidden column present: [pii*]
        when wrong column index:
          model_name: 22

# Check for freshness 
  - freshness (start_date) < 1d

# Check for referential integrity
checks for dim_department_group:
  - values in (department_group_name) must exist in dim_employee (department_name)

Define SodaCL checks

🎥 Watch a 5-minute video for no-code checks and discussions, if you like!

✖️ Requires Soda Core Scientific ✖️ Requires Soda Core ✖️ Requires Soda Library + Soda Cloud ✔️ Requires Soda Agent + Soda Cloud

Prerequisites

  • You, or an Admin on your Soda Cloud account, has deployed a Soda Agent version 0.8.52 or greater, and connected it to your Soda Cloud account.

  • You, or an Admin on your Soda Cloud account, has added a new datasource via the Soda Agent in your Soda Cloud account and configured the data source to discover the datasets in the data source for which you want to write no-code checks. (Soda must have access to dataset names and column names to present those values in dropdown menus during no-code check creation.)

  • You must have permission to edit the dataset; see Manage dataset roles.

Create a new check

SodaCL includes over 25 built-in metrics that you can use to write checks, a subset of which are accessible via no-codecheck creation. The table below lists the checks available to create via the no-code interface; access SodaCL reference for detailed information about each metric or check.

Missing Validity Numeric Duplicate Row count

Freshness Schema SQL Failed rows SQL Metric

  1. As a user with permission to do so of a dataset to which you wish to add checks, navigate to the dataset, then click Add Check. You can only create a check via the no-code interface for datasets in data sources connected via a Soda Agent.

  2. Select the type of check you wish to create, then complete the form to create the check. Refer to table below for guidance on the values to enter.

  3. Optionally, Test your check, then click Propose check to initiate a Discussion with colleagues. Soda executes the check during the next scan according to the schedule you selected, or whenever a Soda Cloud user runs the schedule scan manually. Be aware that a schema check requires a minimum of two measurements before it yields a useful check result because it needs at least one historical measurement of the existing schema against which to compare a new measurement to look for changes. Thus, the first time Soda executes this check, the result is [NOT EVALUATED], indicated by a gray, question mark status icon.

  4. Click Add Check to include the new, no-code check in the next scheduled scan of the dataset. Note that a user with Viewer permissions cannot add a check, they can only propose checks.

  5. Optionally, you can manually execute your check immediately. From the dataset’s page, locate the check you just created and click the stacked dots, then select Execute Check. Soda executes only your check.

Field or Label
Guidance

Dataset

Select the dataset to which you want the check to apply.

Check Name

Provide a unique name for your check.

Add to Scan Definition

Select the scan definition to which you wish to add your check. Optionally, you can click create a new Scan Definition if you want Soda to execute the check more or less frequently, or at a different time of day than existing scan definitions dictate. See Manage scheduled scans for details.

Filter fields

Optionally, add an in-check filter to apply conditions that specify a portion of the data against which Soda executes the check.

Define Metric/Values/Column/SQL

As each metric or check requires different values, refer to SodaCL reference for detailed information about each metric or check. Learn more about how Soda uses OpenAI to process the input for SQL and Regex assistants in no-code checks.

Alert Level

Select the check result state(s) for which you wish to be notified: Fail, Warn, or Fail and Warn. See View scan results for details. By default, alert notifications for your check go to the Dataset Owner. See Define alert notification rules to set up more alert notifications.

Fail Condition, Value, and Value Type

Set the values of these fields to specify the threshold that constitutes a fail or warn check result. For example, if you are creating a Duplicate Check and you want to make sure that less than 5% of the rows in the column you identified contain duplicates, set: • Fail Condition to >Value to 5Value Type to Percent

Attribute fields

Select from among the list of existing attributes to apply to your check so as to organize your checks and alert notifications in Soda Cloud. Refer to Add check attributes for details.

About Soda AI assistants

Powered by OpenAI's GPT-3.5 & GPT-4, the generative SQL and regular expression assistants available in Soda Cloud's no-code checks helps you write the queries and expressions you can add to validity, missing, SQL failed rows, and SQL metric checks.

When creating a Missing or Validity check in the no-code user interface in Soda Cloud, you can click for help from the Soda AI Regex Assistant to translate an English request into a regular expression you can use to define missing or valid values. Similarly, access the Soda AI SQL Assistant in SQL Failed Rows or SQL Metric checks to generate SQL queries based on requests in plain English.

Soda AI SQL and Regex Assistants are enabled for new Soda Cloud accounts by default. If you do not wish to use them, navigate to your avatar > Organization Settings, then click to remove the check from the box for Enable SQL and Regex Assistants Powered By Powered by OpenAI.

Existing Soda customers can review and accept the revised Terms & Conditions, then request access.

Soda acknowledges that the output of the assistants may not be fully accurate or reliable. Leverage the assistants’ output, but be sure to carefully review all queries and expressions you add to your checks. Refer to Soda’s General Terms & Conditions in the Use of AI for further details.

Be aware that Soda shares the content of all SQL and Regex assistant prompts/input and output with OpenAI to perform the processing that yields the output. Following OpenAI’s suggestion, Soda also sends metadata, such as schema information, to OpenAI along with the prompts/input in order to improve the quality of the output. Read more about OpenAI at https://openai.com/policies.

The Ask AI Assistant is powered by kapa.ai and replaces SodaGPT. While Soda collaborates with third parties to develop certain AI features, it’s important to note that Soda does not disclose any primary data with our partners, such as data samples or data profiling details. We only share prompts and some schema information with OpenAI and kapa.ai to enhance the accuracy of the assistants.

Refer to Soda’s General Terms & Conditions in the Use of AI section for further details.

Define alert notification rules

By default, alert notifications for your no-code check go to the Dataset Owner and Check Owner. If you wish to send alerts elsewhere, in addition to the owner, create a notification rule.

For a new rule, you define conditions for sending notifications including the severity of a check result and whom to notify when bad data triggers an alert.

In Soda Cloud, navigate to your avatar > Notification Rules, then click New Notification Rule. Follow the guided steps to complete the new rule. Use the table below for insight into the values to enter in the fields and editing panels.

Field or Label
Guidance

Name

Provide a unique identifier for your notification.

For

Select All Checks, or select Selected Checks to use conditions to identify specific checks to which you want the rule to apply. You can identify checks according to several attributes such as Data Source Name, Dataset Name, or Check Name.

Notify Recipient

Select the destination to which this rule sends its notifications. For example, you can send the rule’s notifications to a channel in Slack.

Notify About

Identify the notifications this rule sends based on the severity of the check result: warn, fail, or both.

Edit an existing check

  1. As a user with permission to do so, navigate to the dataset in which the no-code check exists.

  2. To the right of the check you wish to edit, click the stacked dots, then select Edit Check. You can only edit a check via the no-code interface if it was first created as a no-code check, as indicated by the cloud icon in the Origin column of the table of checks.

  3. Adjust the check as needed, test your check, then save. Soda executes the check during the next scan according to the scan definition you selected.

  4. Optionally, you can execute your check immediately. Locate the check you just edited and click the stacked dots, then select Execute Check. Soda executes only your check.

Next

  1. Choose a flavor of Soda

  2. Set up Soda: install, deploy, or invoke

  3. Write SodaCL checks

  4. Organize, alert, investigate

Need help? Join the Soda community on Slack.

Last updated

Was this helpful?