When creating new monitors in Soda Cloud, you may find it useful to review sample data from your dataset to help you determine the kinds of tests to run when Soda SQL scans your data; see the image below. For this reason, you may wish to enable sample data.
Using the information Soda Cloud discovered about your datasets during its first scan of your data, you can optionally instruct it to capture sample data for specific datasets during the next scheduled scan. Enable sample data to display sample rows of data in Soda Cloud (to a maximum of 1000) so that you can make informed choices about the tests to run against your data when you create a monitor.
DO NOT enable sample data if your dataset contains sensitive information or personally identifiable information (PII).
- From the Datasets dashboard, open the dataset in which you want to enable sample data.
- Click the Sample data tab, then check Enable Sample Data to enable Soda Cloud to capture sample data for the dataset during its next scan. If you see a message that asks you to review time partitioning settings before enabling sample data, click the link, then follow the instructions to review and set the time partitioning settings for the dataset.
- When Soda Cloud completes its next scan, use the sample data to gain some insight into the data contained in your dataset and help you determine the ways in which you want to test it when you create a new monitor.
- If you use Soda SQL, you can add a
samplesconfiguration key to your scan YAML file to send sample data.
- Read more about failed row samples in Soda Cloud.
- Sign up for a free Soda Cloud account.
- Create monitors in Soda Cloud.
- Learn more about Soda Cloud architecture.
- Need help? Join the Soda community on Slack.
Last modified on 15-Sep-21