Reroute failed row samples
Learn how to programmatically use Soda Library with an example script to reroute failed row samples to the CLI output instead of Soda Cloud.
Using Soda Library, you can programmatically run scans that reroute failed row samples to display them in the command-line instead of Soda Cloud.
By default, Soda Library implicitly pushes samples of any failed rows to Soda Cloud for missing, validity, duplicate, and reference checks; see About failed row samples. Instead of sending the results to Soda Cloud, you can use a Python custom sampler to programmatically instruct Soda to display those samples in the command-line.
Follow the instructions below to modify an example script and run it locally to invoke Soda to run a scan on example data and display samples in the command-line for the rows that failed missing, validity, duplicate, and reference checks. This example uses Dask and Pandas to convert CSV sample data into a DataFrame on which Soda can run a scan, and also to convert failed row samples into a CSV to route them to, or display them in, a non-Soda Cloud location.
Note that although the example does not send failed row samples to Soda Cloud, it does still send dataset profile information and the data quality check results to Soda Cloud.
Prerequisites
a code or text editor such as PyCharm or Visual Studio Code
Python 3.8, 3.9, or 3.10
Pip 21.0 or greater
Set up and run example script
Jump to: script
In a browser, navigate to cloud.soda.io/signup to create a new Soda account, which is free for a 45-day trial. If you already have a Soda account, log in.
Navigate to your avatar > Profile, then access the API keys tab. Click the plus icon to generate new API keys. Copy+paste the API key values to a temporary, secure place in your local environment.
Best practice dictates that you run Soda in a virtual environment. From the command line, create a new directory in your environment, then use the following command to create, then activate, a virtual environment called
.sodadataframes.
Run the following commands to upgrade pip, then install Soda Library for Dask and Pandas.
Copy + paste the script below into a new
Soda-dask-pandas-example.pyfile in the same directory in which you created your virtual environment. In the file, replace the above-the-line values with your own Soda Cloud values, then save the file.From the command-line, use the following command to run the example and see both the scan results and the failed row samples as command-line output.
Output:
In your Soda Cloud account, navigate to Datasets, then click to open soda.pandas.example. Soda displays the check results for the scan you just executed via the command-line. If you wish, click the Columns tab to view the dataset profile information Soda Library collected and pushed to Soda Cloud.

Click the Alpha2 Country Codes must be valid row to view the latest check result, which failed. Note that Soda Cloud does not display a tab for Failed Rows Analysis which would normally contain samples of failed rows from the scan.

Example script
Go further
Learn how to Manage sensitive data in Soda Cloud.
Learn how to Disable failed rows sampling for specific columns.
Disable samples in Soda Cloud entirely.
Learn how to use a custom sampler to route failed row samples to an external storage location.
Not quite ready for this big gulp of Soda? 🥤Try taking a sip, first.
Last updated
Was this helpful?
