Link Search Menu Expand Document

Failed rows checks

Use a failed rows check to explicitly send samples of rows that failed a check to Soda Cloud.

checks for dim_customer:
# Failed rows defined using common table expression
  - failed rows:
      samples limit: 50
      fail condition: total_children = '2' and number_cars_owned >= 3
# Failed rows defined using SQL query
  - failed rows:
      fail query: |
        SELECT DISTINCT geography_key
        FROM dim_customer as customer

Prerequisites
About failed row samples
Define failed rows checks
Optional check configurations
Go further

Prerequisites

To use failed row checks, you must have a Soda Cloud account connected to Soda Core. See Connect Soda Core to Soda Cloud for details.

About failed row samples

When a scan results in a failed check, the CLI output displays information about the check that failed and why. To offer more insight into the data that failed a check, Soda Cloud displays failed rows in a check result’s history.

There are two ways you can configure a SodaCL check to send failed row samples to your Soda Cloud account:

  1. Use a duplicate_count metric, a missing metric, or a validity metric in your check. Checks that use these metrics automatically send failed row samples to your Soda Cloud account.
  2. Use failed rows checks in your to explicitly send failed rows to Soda Cloud. Read on!

For security, you can also disable the failed row samples feature entirely; see Disable failed row samples for details.

Define failed rows checks

In the context of SodaCL check types, failed row checks are user-defined. This check is limited in its syntax variation, but you can customize your expression or query as much as you like.

The example below uses common table expression (CTE) to define the fail condition that any rows in the dim_customer dataset must meet in order to qualify as failed rows, during a scan, get sent to Soda Cloud.

In this rather silly example, Soda Core sends any rows which contain the value 2 in the total_children column and which contain a value greater than or equal to 3 in the number_cars_owned column to Soda Cloud as failed row samples. The check also uses the name key to customize a name for the check so that it displays in a more readable form in Soda Cloud; see image below.

checks for dim_customer:
  - failed rows:
      name: Failed rows with CTE
      fail condition: total_children = '2' and number_cars_owned >= 3

failed-rows-CTE


If you prefer, you can use a SQL query to define what qualifies as a failed row for Soda Core to send to Soda Cloud, as in the following simple example.

checks for dim_customer:
  - failed rows:
      name: Failed rows query test
      fail query: |
        SELECT DISTINCT geography_key
        FROM dim_customer as customer

failed-rows-SQL


By default, Soda Core sends 100 sample rows to Soda Cloud. You can limit the number of sample rows that Soda Core using the samples limit configuration key:value pair, as in the following example.

checks for dim_customer:
  - failed rows:
      samples limit: 50
      fail condition: total_children = '2' and number_cars_owned >= 3

Optional check configurations

Configuration Documentation
Define a name for a failed rows check; see example. Customize check names
  Define alert configurations to specify warn and fail alert conditions. -
  Apply a filter to return results for a specific portion of the data in your dataset. -
Use quotes when identifying dataset or column names; see example Use quotes in a check
Use wildcard characters in the value in the check. Use wildcard values as you would with CTE or SQL.
  Use for each to apply schema checks to multiple datasets in one scan. -
Apply a dataset filter to partition data during a scan; see example. Scan a portion of your dataset

Example with check name

checks for dim_customer:
  - failed rows:
      name: Failed rows query test
      fail query: |
        SELECT DISTINCT geography_key
        FROM dim_customer as customer

Example with quotes

checks for dim_customer:
  - failed rows:
      name: Failed rows query test
      fail query: |
        SELECT DISTINCT "geography_key"
        FROM dim_customer as customer

Example with dataset filter

coming soon


Go further


Last modified on 01-Jul-22

Was this documentation helpful?
Share feedback in the Soda community on Slack.

Help improve our docs!

  • Request a docs change.
  • Edit this page in our GitHub repo.