✔ Install from the command-line
✔ Compatible with Snowflake, Amazon Redshift, BigQuery, and more
✔ Write tests in a YAML file
✔ Run programmatic scans to test data quality
✔ Deploy in an Airflow enviroment
Example scan YAML file
table_name: breakdowns
metrics:
- row_count
- missing_count
- missing_percentage
...
# Validates that a table has rows
tests:
- row_count > 0
# Tests that numbers in the column are entered in a valid format as whole numbers
columns:
incident_number:
valid_format: number_whole
tests:
- invalid_percentage == 0
# Tests that no values in the column are missing
school_year:
tests:
- missing_count == 0
# Tests for duplicates in a column
bus_no:
tests:
- duplicate_count == 0
# Compares row count between datasets
sql_metric:
sql: |
SELECT COUNT(*) as other_row_count
FROM other_table
tests:
- row_count == other_row_count
Last modified on 01-Jul-22