✔ An open-source, CLI tool and Python library for data reliability
✔ Compatible with Soda Checks Language (SodaCL) and Soda Cloud
✔ Enables data quality testing both in and out of your pipeline, for data observability, and for data monitoring
✔ Integrated to allow a Soda scan in a data pipeline, or programmatic scans on a time-based schedule
Example checks
# Checks for basic validations
checks for dim_customer:
- row_count between 10 and 1000
- missing_count(birth_date) = 0
- invalid_percent(phone) < 1 %:
valid format: phone number
- invalid_count(number_cars_owned) = 0:
valid min: 1
valid max: 6
- duplicate_count(phone) = 0
checks for dim_product:
- avg(safety_stock_level) > 50
# Checks for schema changes
- schema:
name: Find forbidden, missing, or wrong type
warn:
when required column missing: [dealer_price, list_price]
when forbidden column present: [credit_card]
when wrong column type:
standard_cost: money
fail:
when forbidden column present: [pii*]
when wrong column index:
model_name: 22
# Check for freshness
- freshness(start_date) < 1d
# Check for referential integrity
checks for dim_department_group:
- values in (department_group_name) must exist in dim_employee (department_name)
Access the Soda Core open-source documentation.
Last modified on 01-Jul-22
Was this documentation helpful?
Share feedback in the Soda community on Slack.
Help improve our docs!