Link Search Menu Expand Document

Connect Soda to a local file using Dask

Last modified on 11-Jun-24

For use with programmatic Soda scans, only.
Refer to Connect Soda to Dask and Pandas.

Define a programmatic scan to use Soda to scan a local file for data quality. Refer to the following example that executes a simple check for row count of the dataset.

import dask.dataframe as dd
from soda.scan import Scan

# Create Soda Library Scan object and set a few required properties
scan = Scan()

# Read a `cities` CSV file with columns 'city', 'population'
ddf = dd.read_csv('cities.csv')

scan.add_dask_dataframe(dataset_name="cities", dask_df=ddf)

# Define checks using SodaCL

checks = """
checks for cities:
    - row_count > 0

# Add the checks to the scan and set output to verbose


# Execute the scan

# Inspect the scan object to review scan results

Was this documentation helpful?

What could we do to improve this page?

Documentation always applies to the latest version of Soda products
Last modified on 11-Jun-24