Connect Soda to a local file using Dask

Last modified on 11-Jun-24

For use with programmatic Soda scans, only.
Refer to Connect Soda to Dask and Pandas.

Define a programmatic scan to use Soda to scan a local file for data quality. Refer to the following example that executes a simple check for row count of the dataset.

import dask.dataframe as dd
from soda.scan import Scan

# Create Soda Library Scan object and set a few required properties
scan = Scan()

# Read a `cities` CSV file with columns 'city', 'population'
ddf = dd.read_csv('cities.csv')

scan.add_dask_dataframe(dataset_name="cities", dask_df=ddf)

# Define checks using SodaCL

checks = """
checks for cities:
    - row_count > 0

# Add the checks to the scan and set output to verbose


# Execute the scan

# Inspect the scan object to review scan results

