Send sample data to Soda Cloud
Interested in getting early access? Let us know!
Use the sample datasets
configuration in your checks YAML file to send 100 sample rows to Soda Cloud. Examine the sample rows to gain insight into the type checks you can prepare to test for data quality.
Requires Soda Cloud.
sample datasets:
datasets:
- dim_customer
- include prod%
- exclude test%
Prerequisites
Define sample datasets
Optional check configurations
Go further
Prerequisites
- You have installed a Soda Core package in your environment.
- You have configured Soda Core to connect to a data source using a
configuration.yml
file. - You have created and connected a Soda Cloud account to Soda Core.
Define sample datasets
This configuration is limited in its syntax variation, with only a couple of mutable parts to specify the datasets from which to gather and send sample rows to Soda Cloud.
The example configuration below uses a wildcard character (%
) to specify that Soda Core sends sample rows to Soda Cloud for all datasets with names that begin with customer
, and not to send samples for any dataset with a name that begins with test
.
sample datasets:
datasets:
- include customer%
- exclude test%
You can also specify individual datasets to include or exclude, as in the following example.
sample datasets:
datasets:
- include retail_orders
Scan results in Soda Cloud
- To review the sample rows in Soda Cloud, first run a scan of your data source so that Soda Core can gather and send samples to Soda Cloud.
- In Soda Cloud, navigate to the Datasets dashboard, then click a dataset name to open the dataset’s info page.
- Access the Sample Data tab to review the sample rows.
Optional check configurations
Supported | Configuration | Documentation |
---|---|---|
Define a name for sample data configuration. | - | |
Define alert configurations to specify warn and fail thresholds. | - | |
Apply a filter to return results for a specific portion of the data in your dataset. | - | |
✓ | Use quotes when identifying dataset names; see example | Use quotes in a check |
✓ | Use wildcard characters ( % with dataset names in the check; see example. | - |
Use for each to apply anomaly score checks to multiple datasets in one scan. | - | |
Apply a dataset filter to partition data during a scan. | - |
Example with quotes
sample datasets:
datasets:
- include "prod_customer"
Example with wildcards
sample datasets:
datasets:
- include prod%
- exclude test%
Go further
- Need help? Join the Soda community on Slack.
- Use a freshness check to gauge how recently your data was captured.
- Use reference checks to compare the values of one column to another.
Was this documentation helpful?
What could we do to improve this page?
- Suggest a docs change in GitHub.
- Share feedback in the Soda community on Slack.
Last modified on 10-Aug-22