Last modified on 27-Sep-23
Leveraging all the power of Soda Core and SodaCL, the extension offers new features and functionality for Soda customers.
New with Soda Library
- Run Check Suggestions in the Soda Library CLI to profile your data and auto-generate basic checks for data quality.
- Use Group By configuration and Group By Evolution checks to organize data quality check results by category.
- Configure a Check Template to customize a metric you can reuse in multiple checks.
Existing customers can seamlessly migrate from Soda Core to Soda Library.
✔ A Python library and CLI tool for data quality testing
✔ Compatible with Soda Checks Language (SodaCL) and Soda Cloud
✔ Supports Check suggestions to auto-generate basic quality checks tailored to your data
✔ Enables data quality testing both in your data pipeline and development workflows
✔ Extended from Soda Core, a free, open-source CLI and Python library in GitHub
# Checks for basic validations checks for dim_customer: - row_count between 10 and 1000 - missing_count(birth_date) = 0 - invalid_percent(phone) < 1 %: valid format: phone number - invalid_count(number_cars_owned) = 0: valid min: 1 valid max: 6 - duplicate_count(phone) = 0 checks for dim_product: - avg(safety_stock_level) > 50
# Check for schema changes checks for dim_product: - schema: name: Find forbidden, missing, or wrong type warn: when required column missing: [dealer_price, list_price] when forbidden column present: [credit_card] when wrong column type: standard_cost: money fail: when forbidden column present: [pii*] when wrong column index: model_name: 22
# Check for freshness checks for dim_product: - freshness(start_date) < 1d
# Check for referential integrity checks for dim_department_group: - values in (department_group_name) must exist in dim_employee (department_name)
Simplify the work of testing and maintaining good-quality data.
- Download the Soda Library (free a 45-day trial!) and configure settings and data quality checks in two simple YAML files to start scanning your data within minutes.
- Connect Soda Library to over a dozen data sources to scan volumes of data for quality.
- Write data quality checks using SodaCL, a low-code, human-readable, domain-specific language for data quality management.
- Use the Soda Library to build programmatic scans that you can use in conjunction with orchestration tools like Airflow or Prefect to automate pipeline actions when data quality checks fail.
- Run the same scans for data quality in multiple environments such as development, staging, and production.
Was this documentation helpful?
What could we do to improve this page?
Documentation always applies to the latest version of Soda products
Last modified on 27-Sep-23