Link Search Menu Expand Document


✔ An open-source, CLI tool and Python library for data reliability

✔ Compatible with Soda Checks Language (SodaCL) and Soda Cloud

✔ Enables data quality testing both in and out of your data pipeline, for data observability and reliability

✔ Enables programmatic scans on a time-based scheudle

Example checks

# Checks for basic validations
checks for dim_customer:
  - row_count between 10 and 1000
  - missing_count(birth_date) = 0
  - invalid_percent(phone) < 1 %:
      valid format: phone number
  - invalid_count(number_cars_owned) = 0:
      valid min: 1
      valid max: 6
  - duplicate_count(phone) = 0

checks for dim_product:
  - avg(safety_stock_level) > 50
# Checks for schema changes
  - schema:
      name: Find forbidden, missing, or wrong type
        when required column missing: [dealer_price, list_price]
        when forbidden column present: [credit_card]
        when wrong column type:
          standard_cost: money
        when forbidden column present: [pii*]
        when wrong column index:
          model_name: 22

# Check for freshness 
  - freshness(start_date) < 1d

# Check for referential integrity
checks for dim_department_group:
  - values in (department_group_name) must exist in dim_employee (department_name)

Access the Soda Core open-source documentation.

Was this documentation helpful?

What could we do to improve this page?

Last modified on 30-Sep-22