✔ An open-source, CLI tool and Python library for data reliability
✔ Compatible with Soda Checks Language (SodaCL) and Soda Cloud
✔ Enables data quality testing both in and out of your data pipeline, for data observability and reliability
✔ Enables programmatic scans on a time-based scheudle
# Checks for basic validations checks for dim_customer: - row_count between 10 and 1000 - missing_count(birth_date) = 0 - invalid_percent(phone) < 1 %: valid format: phone number - invalid_count(number_cars_owned) = 0: valid min: 1 valid max: 6 - duplicate_count(phone) = 0 checks for dim_product: - avg(safety_stock_level) > 50 # Checks for schema changes - schema: name: Find forbidden, missing, or wrong type warn: when required column missing: [dealer_price, list_price] when forbidden column present: [credit_card] when wrong column type: standard_cost: money fail: when forbidden column present: [pii*] when wrong column index: model_name: 22 # Check for freshness - freshness(start_date) < 1d # Check for referential integrity checks for dim_department_group: - values in (department_group_name) must exist in dim_employee (department_name)
Access the Soda Core open-source documentation.
Was this documentation helpful?
What could we do to improve this page?
Last modified on 30-Sep-22