Scan time & scan time strategy

Selecting the right scan time is essential for accurate data monitoring and reliable metric collection. Scans that occur too early may run before the data has been fully loaded into the database, leading to false positives or misleading results. This guide outlines how to determine the best scan time based on your data load patterns and operational needs.

Key considerations

Data load completion time

When is the database load expected to be complete?

  • Determine when the relevant tables or datasets are expected to be fully loaded.

  • Factor in common variances: if a load is expected to complete by 00:00 UTC but occasionally finishes at 00:10 UTC, account for the expected, albeit sporadic, delay.

Knowing this helps avoid scanning too early and capturing incomplete data.

Acceptable load delay tolerance

When is a delayed load considered late or "problematic"?

  • If data arriving by 02:30 UTC is still valid for monitoring purposes, it may be better to delay the scan to reduce false alerts.

  • Scanning immediately after the earliest expected load time is not always necessary.

Understanding what qualifies as "late data" helps define the tolerance window for scan timing.

Response window & team availability

How fast after the load can someone respond to issues flagged by monitors?

  • If nobody can take action until 09:00 UTC, scanning earlier may not be useful unless scans feed downstream processes or dashboards.

Choose a scan time that aligns with both data readiness and team readiness.


Example scenario

  • Expected load completion: 00:00 UTC

  • Occasional load delay: up to 00:00 UTC

  • Team available from: 08:00 UTC

Scan options

Strategy
Scan time
Rationale

Minimal buffer

00:15 UTC

Captures data soon after load with minor delay tolerance.

Conservative buffer

01:30 UTC

Allows extra time for delayed loads, reduces risk of false positives.

Operationally aligned

07:30 UTC

Ensures scan results are fresh and complete when the team starts reviewing.

Scan scheduling at scale

When scanning large volumes of tables:

  • It is acceptable to configure scans for the same scheduled time (e.g. 00:00 UTC).

  • Scans that are scheduled in large volumes (thousands of tables) may be configured to run at the same logical time, but the system naturally distributes execution based on queuing and available resources, so the actual execution will be staggered.


Historical scans

  • Historical metric collection scans (for metric baseline backfilling) run only once at configuration time.

  • These scans are not governed by the scan schedule. They occur once and they are typically the most resource-intensive.


Best practices

  • Consistency is key: Using the same scan daily establishes a stable baseline for anomaly detection.

  • Early scans should be avoided: Scheduling scans before the last acceptable load time is not recommended unless business needs require it.

  • Time zones should be centralized: Aligning scan time with the database time zone is ideal, especially when your time partitioning column is based on the insert/load time in that time zone.

  • Monitoring and adjusting: If load patterns or SLAs change, scan times should be revisited and adjusted accordingly.

Last updated

Was this helpful?