Define programmatic scans
To automate the search for “bad” data, you can use the Soda Core Python library to programmatically execute scans.
Based on a set of conditions or a specific event schedule, you can instruct Soda Core to automatically scan a data source. For example, you may wish to scan your data at several points along your data pipeline, perhaps when new data enters a data source, after it is transformed, and before it is exported to another data source.
You can save Soda Core scan results anywhere in your system; the scan_result
object contains all the scan result information. To import the Soda Core library in Python so you can utilize the Scan()
object, install a Soda Core package, then use from soda.scan import Scan
.
Basic programmatic scan
from soda.scan import Scan
scan = Scan()
scan.set_data_source_name("events")
# Add configuration YAML files
#########################
# Multiple strategies available:
# 1) From a file
scan.add_configuration_yaml_file(file_path="~/.soda/my_local_soda_environment.yml")
# 2) From explicit environment variable(s)
scan.add_configuration_yaml_from_env_var(env_var_name="SODA_ENV")
# 3) From environment variables using a prefix
scan.add_configuration_yaml_from_env_vars(prefix="SODA_")
# 4) In code.
scan.add_configuration_yaml_str(
"""
data_source events:
type: snowflake
connection:
host: ${SNOWFLAKE_HOST}
username: ${SNOWFLAKE_USERNAME}
password: ${SNOWFLAKE_PASSWORD}
database: events
schema: public
"""
)
# Add check YAML files
##################
scan.add_sodacl_yaml_file("./my_programmatic_test_scan/sodacl_file_one.yml")
scan.add_sodacl_yaml_file("./my_programmatic_test_scan/sodacl_file_two.yml")
scan.add_sodacl_yaml_files("./my_scan_dir")
scan.add_sodacl_yaml_files("./my_scan_dir/sodacl_file_three.yml")
# Add variables
###############
scan.add_variables({"date": "2022-01-01"})
# Execute the scan
##################
scan.execute()
# Inspect the scan result
#########################
scan.assert_no_error_logs()
scan.assert_no_checks_fail()
scan.has_error_logs()
scan.get_error_logs_text()
scan.get_checks_fail()
scan.get_checks_fail_text()
scan.assert_no_checks_warn_or_fail()
scan.get_checks_warn_or_fail()
scan.has_checks_warn_or_fail()
scan.get_checks_warn_or_fail_text()
scan.get_all_checks_text()
Scan exit codes
Soda Core’s scan output includes an exit code which indicates the outcome of the scan.
0 | all checks passed, all good from both runtime and Soda perspective |
1 | Soda issues a warning on a check(s) |
2 | Soda issues a failure on a check(s) |
3 | Soda encountered a runtime issue |
Last modified on 01-Jul-22
Was this documentation helpful?
Share feedback in the Soda community on Slack.
Help improve our docs!