Link Search Menu Expand Document

Connect Soda to a local file using Dask

Last modified on 19-Sep-24

For use with programmatic Soda scans, only.
Refer to Connect Soda to Dask and Pandas.

Define a programmatic scan to use Soda to scan a local file for data quality. Refer to the following example that executes a simple check for row count of the dataset.

import dask.dataframe as dd
from soda.scan import Scan

# Create Soda Library Scan object and set a few required properties
scan = Scan()
scan.set_scan_definition_name("test")
scan.set_data_source_name("dask")

# Read a `cities` CSV file with columns 'city', 'population'
ddf = dd.read_csv('cities.csv')

scan.add_dask_dataframe(dataset_name="cities", dask_df=ddf)

# Define checks using SodaCL

checks = """
checks for cities:
    - row_count > 0
"""

# Add the checks to the scan and set output to verbose
scan.add_sodacl_yaml_str(checks)

scan.set_verbose(True)

# Execute the scan
scan.execute()

# Inspect the scan object to review scan results
scan.get_scan_results()




Was this documentation helpful?

What could we do to improve this page?

Documentation always applies to the latest version of Soda products
Last modified on 19-Sep-24