✔ An open-source, CLI tool and Python library for data reliability
✔ Compatible with Soda Checks Language (SodaCL) and Soda Cloud
✔ Enables data quality testing both in and out of your pipeline, for data observability, and for data monitoring
✔ Integrated to allow a Soda scan in a data pipeline, or programmatic scans on a time-based schedule
# Checks for basic validations checks for dim_customer: - row_count between 10 and 1000 - missing_count(birth_date) = 0 - invalid_percent(phone) < 1 %: valid format: phone number - invalid_count(number_cars_owned) = 0: valid min: 1 valid max: 6 - duplicate_count(phone) = 0 checks for dim_product: - avg(safety_stock_level) > 50 # Checks for schema changes - schema: name: Find forbidden, missing, or wrong type warn: when required column missing: [dealer_price, list_price] when forbidden column present: [credit_card] when wrong column type: standard_cost: money fail: when forbidden column present: [pii*] when wrong column index: model_name: 22 # Check for freshness - freshness(start_date) < 1d # Check for referential integrity checks for dim_department_group: - values in (department_group_name) must exist in dim_employee (department_name)
Access the Soda Core open-source documentation.
Last modified on 01-Jul-22
Was this documentation helpful?
Share feedback in the Soda community on Slack.
Help improve our docs!