Link Search Menu Expand Document

Connect Soda to GCP Big Query

For Soda to run quality scans of your data, you must configure it to connect to your data source.

  • For Soda Core, add the connection configurations to your configuration.yml file. Read more.
  • preview For Soda Cloud, add the connection configurations to step 3 of the New Data Source workflow. Read more.

Connection configuration
Authentication methods
Supported data types

A note about BigQuery datasets: Google uses the term dataset slightly differently than Soda (and many others) do.

  • In the context of Soda, a dataset is a representation of a tabular data structure with rows and columns. A dataset can take the form of a table in PostgreSQL or Snowflake, a stream in Kafka, or a DataFrame in a Spark application.
  • In the context of BigQuery, a dataset is “a top-level container that is used to organize and control access to your tables and views. A table or view must belong to a dataset…”

Instances of “dataset” in Soda documentation always reference the former.

Connection configuration

data_source my_datasource_name:
  type: bigquery
  connection:
    account_info_json: '{
        "type": "service_account",
        "project_id": "...",
        "private_key_id": "...",
        "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
        "client_email": "...@project.iam.gserviceaccount.com",
        "client_id": "...",
        "auth_uri": "https://accounts.google.com/o/oauth2/auth",
        "token_uri": "https://accounts.google.com/o/oauth2/token",
        "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
        "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/..."
}'
    auth_scopes:
    - https://www.googleapis.com/auth/bigquery
    - https://www.googleapis.com/auth/cloud-platform
    - https://www.googleapis.com/auth/drive
    project_id: "..."
    dataset: sodacore
Property Required
type required
account_info_json optional; inline properties listed below; if not provided, Soda uses Google Application Default Credentials
  type required
  project_id required
  private_key_id required
  private_key required
  client_email required
  client_id required
  auth_uri required
  token_uri required
  auth_provider_x509_cert_url required
  client_x509_cert_url required
auth_scopes optional; Soda applies the three scopes listed above by default
project_id optional; overrides project_id from account_info_json
dataset required

Authentication methods

Using GCP BigQuery, you have the option of using one of several methods to authenticate the connection.

  1. Application Default Credentials
  2. Application Default Credentials with Service Account impersonation
  3. Service Account Key (see connection configuration above)
  4. Service Account Key with Service Account Impersonation


Application Default Credentials

Add the use_context_auth property to your connection configuration, as per the following example.

data_source my_datasource:
  type: bigquery
  connection:
    use_context_auth: True


Application Default Credentials with Service Account impersonation

Add the use_context_auth and impersonation_account properties to your connection configuration, as per the following example.

data_source my_datasource:
  type: bigquery
  connection:
    use_context_auth: True
    impersonation_account: <SA_EMAIL>


Service Account Key with Service Account impersonation

Add the impersonation_account property to your connection configuration, as per the following example.

data_source my_database_name:
  type: bigquery
  connection:
    account_info_json: '{
        "type": "service_account",
        "project_id": "...",
        "private_key_id": "...",
      ...}'
    impersonation_account: <SA_EMAIL>


Supported data types

Category Data type
text STRING
number INT64, DECIMAL, BINUMERIC, BIGDECIMAL, FLOAT64
time DATE, DATETIME, TIME, TIMESTAMP




Was this documentation helpful?

What could we do to improve this page?


Last modified on 30-Sep-22