Metadata data sources

For each supported data source, Soda always calculates two metrics:

  • Total row count

  • Last modification time


Oracle

  • Row count: metadata row count is actually a count(*)

Soda does not use metadata for this metric in Oracle. It requires an additional package and/or is unreliable based on the schedule of that package.

  • Last insertion time: Soda uses metadata

Note that past data is only available for a limited amount of time, which varies depending on the system. The minimum goes back 120 h.

  • Historical backfilling: not possible.

Postgres

Metadata is supported, but it requires some additional setup on Postgres side.

  • Row count: enabled out of the box.

  • Last insertion time: track_commit_timestamp must be enabled: https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-TRACK-COMMIT-TIMESTAMP

    • If track_commit_timestamp is not enabled, Soda will return a warning.

  • Historical backfilling: not possible.

BigQuery

Metadata metrics are available and supported in BigQuery.

  • Historical backfilling: possible.

  • Partition column: can be suggested based on metadata available in BigQuery.

    • Soda will prioritize user-suggested columns.

    • If there are no user-suggested columns, Soda will try a metadata approach to find the partition column automatically.

    • If there are no columns found in the metadata of BigQuery, Soda will fall back on its own heuristic.

Partition column availability in BigQuery:

If the user has configured a partitioning column on BigQuery's side, Soda will use it (given that it is a date/timestamp column).

Otherwise, Soda will fall back on a standard sampling method to detect the partition column.

Redshift

  • Historical backfilling is supported on Redshift and it is limited to 7 days for the metadata.

Last updated

Was this helpful?