Metadata data sources

Supported data sources

Below, find a snapshot of all the capabilities supported for each data source, including metadata and metadata history:

Data source
Onboarding
Metadata
Metadata history
Querying data

Athena

Azure Synapse

BigQuery

Databricks

Dremio

Upcoming

MS Fabric

MS SQL server

MySQL

Upcoming

Oracle

PostgreSQL

Redshift

Snowflake

Trino

Upcoming

Limitations & edge cases

Some data sources present limitations when it comes to metadata collection or retrieval. Keep in mind these caveats when choosing which data source(s) to connect to Soda.

Oracle

  • Historical backfilling: not possible.

  • Row count: metadata row counts are calculated via count(*). Soda does not use metadata for this metric in Oracle. It requires an additional package and/or is unreliable based on the schedule of that package.

  • Last modification time: Soda uses metadata

Note that past data is only available for a limited amount of time, which varies depending on the system. The minimum goes back 120 h.

circle-exclamation

Postgres

Metadata is supported, but it requires some additional setup on Postgres's side.

  • Historical backfilling: not possible.

  • Row count: enabled out-of-the-box.

  • Last modification time: track_commit_timestamp must be enabled: https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-TRACK-COMMIT-TIMESTAMP

    • If track_commit_timestamp is not enabled, Soda will return a warning.


BigQuery

Metadata metrics are available and supported in BigQuery.

  • Historical backfilling: possible.

  • Partition column: can be suggested based on metadata available in BigQuery.

    • Soda will prioritize user-suggested columns.

    • If there are no user-suggested columns, Soda will try a metadata approach to find the partition column automatically.

    • If there are no columns found in the metadata of BigQuery, Soda will fall back on its own heuristic.

circle-info

Partition column availability in BigQuery:

If the user has configured a partitioning column on BigQuery's side, Soda will use it (given that it is a date/timestamp column).

Otherwise, Soda will fall back on a standard sampling method to detect the partition column.


Redshift

  • Historical backfilling is supported on Redshift and it is limited to 7 days for the metadata.

  • Modification time does not include schema changes. Only:

    • inserts

    • updates

    • deletes


Synapse

Synapse does not provide metadata history tables.

  • Historical backfilling: not possible.

  • Last modification time: not possible.

  • Row count: current row counts are calculated via count(*). Soda does not use metadata for this metric in Synapse.

  • Quartile metrics (Q1, median, Q3): not possible. Synapse does not support quartile metrics.


circle-info

You are not logged in to Soda and are viewing the default public documentation. Learn more about Documentation access & licensing.

Last updated

Was this helpful?