✔ Human-readable, domain-specific language for data reliability
✔ Includes 25+ built-in metrics, plus the ability to include SQL queries
✔ Includes checks with dynamic thresholds to gauge changes to metrics over time
✔ Collaborate with your team to write SodaCL checks in a YAML file
# Checks for basic validations checks for dim_customer: - row_count between 10 and 1000 - missing_count(birth_date) = 0 - invalid_percent(phone) < 1 %: valid format: phone number - invalid_count(number_cars_owned) = 0: valid min: 1 valid max: 6 - duplicate_count(phone) = 0 checks for dim_product: - avg(safety_stock_level) > 50 # Checks for schema changes - schema: name: Find forbidden, missing, or wrong type warn: when required column missing: [dealer_price, list_price] when forbidden column present: [credit_card] when wrong column type: standard_cost: money fail: when forbidden column present: [pii*] when wrong column index: model_name: 22 # Check for freshness - freshness (start_date) < 1d # Check for referential integrity checks for dim_department_group: - values in (department_group_name) must exist in dim_employee (department_name)
Was this documentation helpful?
What could we do to improve this page?
Last modified on 30-Sep-22