Link Search Menu Expand Document

Define time partitioning for dataset scans

By default, Soda Cloud scans all the data in your datasets each time it conducts a scan. If you wish, you can adjust the time partitioning of a dataset to instruct Soda Cloud to scan only the data that has been added to the dataset since the time it last conducted a scan. For example, you can enable time partitioning on a dataset with a large volume of static data to which small volumes of data are regularly added.

  1. In the Datasets dashboard, click the stacked dots icon of the dataset you wish to edit, then select Edit Dataset.
  2. In the Time Partitioning tab, use SQL WHERE clause to define a time partition for scans using two ISO 8601 variables: {{ prevScanTime }} and {{ scanTime }} .
  3. Save your changes, then wait for Soda Cloud to complete the next scan of your dataset according to its schedule.

Go further

Last modified on 15-Sep-21

Was this documentation helpful?
Give us your feedback in the #soda-docs channel in the Soda community on Slack or open an issue in GitHub.