Link Search Menu Expand Document

Configure programmatic scans

To automate the search for “bad” data, you can use the Soda SQL Python library to programmatically execute scans.

Based on a set of conditions or a specific event schedule, you can instruct Soda SQL to automatically scan a data source for “bad” data. For example, you may wish to scan your data at several points along your data pipeline, perhaps when new data enters a data source, after it is transformed, and before it is exported to another data source.

You can save Soda SQL scan results anywhere in your system; the scan_result object contains all the scan result information.

Execute a programmatic scan based on Soda SQL’s default directory structure:

scan_builder = ScanBuilder()
scan_builder.scan_yml_file = 'tables/my_table.yml'
# scan_builder automatically finds the warehouse.yml in the parent directory of the scan YAML file
# scan_builder.warehouse_yml_file = '../warehouse.yml'
scan = scan_builder.build()
scan_result = scan.execute()
if scan_result.has_test_failures():
    print('Scan has test failures, stop the pipeline')

Execute a programmatic scan using dicts:

scan_builder = ScanBuilder()
scan_builder.warehouse_yml_dict = {
    'name': 'my_warehouse_name',
    'connection': {
        'type': 'snowflake',
        ...
    }
}
scan_builder.scan_yml_dict = {
    ...
}
scan = scan_builder.build()
scan_result = scan.execute()
if scan_result.has_test_failures():
    print('Scan has test failures, stop the pipeline')

Go further



Last modified on 16-Jul-21

Was this documentation helpful?
Give us your feedback in the #soda-docs channel in the Soda community on Slack or open an issue in GitHub.