Link Search Menu Expand Document

Display sample data for a dataset

When creating new monitors in Soda Cloud, you may find it useful to review sample data from your dataset to help you determine the kinds of tests to run when Soda SQL scans your data; see the image below. For this reason, you may wish to enable sample data.


Using the information Soda Cloud discovered about your datasets during its first scan of your data, you can optionally instruct it to capture sample data for specific datasets during the next scheduled scan. Enable sample data to display sample rows of data in Soda Cloud (to a maximum of 1000) so that you can make informed choices about the tests to run against your data when you create a monitor.

DO NOT enable sample data if your dataset contains sensitive information or personally identifiable information (PII).

  1. From the Datasets dashboard, open the dataset in which you want to enable sample data.
  2. Click the Sample data tab, then check Enable Sample Data to enable Soda Cloud to capture sample data for the dataset during its next scan. If you see a message that asks you to review time partitioning settings before enabling sample data, click the link, then follow the instructions to review and set the time partitioning settings for the dataset.
  3. When Soda Cloud completes its next scan, use the sample data to gain some insight into the data contained in your dataset and help you determine the ways in which you want to test it when you create a new monitor.

Go further

Last modified on 15-Sep-21

Was this documentation helpful?
Give us your feedback in the #soda-docs channel in the Soda community on Slack or open an issue in GitHub.