When creating new monitors in Soda Cloud, you may find it useful to review sample data from your dataset to help you determine the kinds of tests to run when Soda SQL scans your data; see the image below. For this reason, you may wish to configure a
samples configuration key in Soda SQL.
Alternatively, you can Enable Sample Data directly in your Soda Cloud account. Refer to Display sample data for details.
DO NOT use sample data if your dataset contains sensitive information or personally identifiable information (PII).
- If you have not already done so, connect Soda SQL to your Soda Cloud account.
- Add a
samplesconfiguration key to your scan YAML file according to the Scan YAML example below; use
table_limitto define a value that represents the numerical threshold of rows in a dataset that Soda SQL sends to Soda Cloud after it executes a test during a scan. It yields a sample of the data from your dataset in the Sample Data tab when you are creating a new monitor; see image above.
- Save the changes to your scan YAML file, then run a scan on that dataset.
soda scan warehouse.yml/tables/orders.yml
- In your Soda Cloud account, navigate to the Monitors dashboard. Click the stacked-dots icon to Create Monitor. Note that in the first step of the guided monitor creation, you can review sample data from your dataset that Soda SQL collected during its last scan of your dataset.
table_name: orders metrics: - row_count - missing_count - missing_percentage - values_count ... samples: table_limit: 50 tests: - row_count > 0 columns: orderid: valid_format: uuid tests: - invalid_percentage <= 3
Using the example scan YAML above, the scan executes both tests against all the data in the dataset, but it only sends a maximum of 50 rows of data and metadata to Soda Cloud for review as sample data when creating a new monitor for the
The snippet below displays the CLI output of the query that counts the rows in the dataset; Soda SQL counts 193 rows but only sends 50 as a sample to Soda Cloud.
| ... | Executing SQL query: SELECT * FROM "public"."orders" LIMIT 50; | SQL took 0:00:00.074957 | Sent sample orders.sample (50/193) to Soda Cloud | ...
- Read more about failed row samples in Soda Cloud.
- Sign up for a free Soda Cloud account.
- Create monitors in Soda Cloud.
- Learn how to display sample data for datasets in Soda Cloud.
- Learn more about Soda Cloud architecture.
- Need help? Join the Soda community on Slack.
Last modified on 15-Sep-21