# BigQuery

### Connection configuration reference

Install the following package:

```bash
pip install soda-bigquery
```

#### Data source YAML

Provide **exactly one** of the following:

* `account_info_json` – inline BigQuery service account JSON
* `account_info_json_path` – path to a BigQuery service account JSON file

{% hint style="info" %}
If `use_context_auth: true`, Soda uses **Application Default Credentials (ADC)** and neither option is required.
{% endhint %}

**Provide `account_info_json`:**

<pre class="language-yaml" data-title="ds_config.yml"><code class="lang-yaml">type: bigquery
name: my_bigquery
connection:
<strong>    account_info_json: '{
</strong><strong>    "type": "service_account",
</strong><strong>    "project_id": "dbt-quickstart-44203",
</strong><strong>    "private_key_id": "fe0a60e9cb7d4369f73f7b5691ce397d1e",
</strong><strong>    "private_key": "-----BEGIN PRIVATE KEY-----&#x3C;insert-private-key>-----END PRIVATE KEY-----\n",
</strong><strong>    "client_email": "dbt-user@dbt-quickstart-448203.iam.gserviceaccount.com",
</strong><strong>    "client_id": "114963712293161062",
</strong><strong>    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
</strong><strong>    "token_uri": "https://oauth2.googleapis.com/token",
</strong><strong>    "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
</strong><strong>    "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/dbt-user%40dbt-quickstart-44803.iam.gserviceaccount.com",
</strong><strong>    "universe_domain": "googleapis.com"
</strong><strong>  }' # example service account JSON string, exported from BQ
</strong>  dataset: ${env.BQ_DATASET_NAME}
  # optional
  project_id: ${env.BQ_PROJECT_ID}        # Defaults to the one embedded in the account JSON
  storage_project_id: ${env.BQ_STORAGE_PROJECT_ID}
  location: ${env.BQ_LOCATION}            # Defaults to the specified project's location
  auth_scopes:
    - https://www.googleapis.com/auth/bigquery
    - https://www.googleapis.com/auth/cloud-platform
    - https://www.googleapis.com/auth/drive
  client_options: &#x3C;options-dict-for-bq-client>
  labels: &#x3C;labels-dict-for-bq-client>
  impersonation_account: &#x3C;name-of-impersonation-account>
  delegates: &#x3C;list-of-delegates-names>
  ##use_context_auth: false     # whether to use Application Default Credentials

</code></pre>

**Provide `account_info_json_path`:**

**Create the config file:**

```shellscript
soda data-source create -f ds_config.yml
```

The data source configuration YAML should look like the following:

<pre class="language-yaml" data-title="ds_config.yml"><code class="lang-yaml">type: bigquery
name: my_bigquery
connection:
<strong>  account_info_json_path: /path/to/service-account.json
</strong>  dataset: ${env.BQ_DATASET_NAME}
    # optional
  project_id: ${env.BQ_PROJECT_ID}        # Defaults to the one embedded in the account JSON
  storage_project_id: ${env.BQ_STORAGE_PROJECT_ID}
  location: ${env.BQ_LOCATION}            # Defaults to the specified project's location
  auth_scopes:
    - https://www.googleapis.com/auth/bigquery
    - https://www.googleapis.com/auth/cloud-platform
    - https://www.googleapis.com/auth/drive
  client_options: &#x3C;options-dict-for-bq-client>
  labels: &#x3C;labels-dict-for-bq-client>
  impersonation_account: &#x3C;name-of-impersonation-account>
  delegates: &#x3C;list-of-delegates-names>
  ##use_context_auth: false                # whether to use Application Default Credentials
</code></pre>

{% hint style="info" %}
**Note:** Google uses the term "dataset" differently than Soda:

* In the context of Soda, a [dataset](https://app.gitbook.com/s/oV0A6Eua8LUIyWgHxsjf/learning-resources/glossary#dataset) is a representation of a tabular data structure with rows and columns, such as a table, view, or data frame.
* In the context of BigQuery, a [dataset](https://cloud.google.com/bigquery/docs/datasets-intro) is “a top-level container that is used to organize and control access to your tables and views. A table or view must belong to a dataset…”

Instances of "dataset" in Soda documentation always reference the former.
{% endhint %}

> * See [Google BigQuery Integration parameters](https://cloud.google.com/chronicle/docs/soar/marketplace-integrations/google-big-query#integration_parameters)
> * See[ BigQuery's locations documentation](https://cloud.google.com/bigquery/docs/locations) to learn more about `location`.

#### Connection test

Test the data source connection:

```bash
soda data-source test -ds ds_config.yml
```

***

## Limitations

### **`DATE` vs `TIMESTAMP` partition columns**

* Soda **expects a `TIMESTAMP`-based partition column** for full compatibility with partition-based monitoring and freshness calculations in BigQuery.
* If your dataset uses a `DATE` column as the partition key, you must **explicitly cast** the column to `TIMESTAMP` to ensure valid SQL execution.

#### Recommended approaches:

* **Preferred**: Use a `TIMESTAMP` column as the time partition column when onboarding BigQuery datasets.
* **Alternative**: If a `DATE` column must be used, apply an explicit cast to `TIMESTAMP` in filters or expressions.

### Sharded tables

Soda Cloud treats each date-sharded table in BigQuery as an independent dataset. In practice, tables named with date suffixes (for example, `events_20240617`, `events_20240618`) will:

* **Count separately** in the UI, appearing as distinct entries in the Datasets list.
* **Count separately** toward Soda pricing, since each shard is billed as its own dataset.
* **Maintain independent schemas**, so any structural differences between shards must be managed manually (for example, by aligning column names and types before onboarding).

There is no built-in metadata or grouping mechanism in Soda Cloud to unify shards under a “main” logical table.

#### Recommended migration to native partitions

BigQuery now recommends using **time-partitioned tables** instead of legacy sharding. Partitioned tables offer better performance, lower cost, and a single logical table for both queries and metadata. To migrate:

1. Follow BigQuery’s guide to [convert date-sharded tables into a partitioned table](https://cloud.google.com/bigquery/docs/creating-partitioned-tables#convert-date-sharded-tables).
2. In Soda Cloud, onboard the new partitioned table once. Soda will treat it as a single dataset.

#### Short-term workarounds

If migration to partitioned tables is not yet feasible for you:

* **Table-name patterns:** Onboard your shards as separate datasets, but use consistent naming conventions and Dashboard filters to group them visually.
* **External catalog consolidation:** If you use a metadata layer, such as Atlan, to virtualize shards into a single logical asset, you can point Soda at that consolidated view. Keep in mind Soda will still count it as one dataset only if it surfaces as a single table name in BigQuery.

<br>

***

{% if (visitor.claims.plan === 'datasetStandard')%}
{% hint style="success" %}
You are **logged in to Soda** and seeing the **Free license** documentation. Learn more about [documentation-access-and-licensing](https://docs.soda.io/reference/documentation-access-and-licensing "mention").
{% endhint %}
{% endif %}

{% if (visitor.claims.plan === 'enterprise')%}
{% hint style="success" %}
You are **logged in to Soda** and seeing the **Team license** documentation. Learn more about [documentation-access-and-licensing](https://docs.soda.io/reference/documentation-access-and-licensing "mention").
{% endhint %}
{% endif %}

{% if (visitor.claims.plan === 'enterpriseUserBased')%}
{% hint style="success" %}
You are **logged in to Soda** and seeing the **Enterprise license** documentation. Learn more about [documentation-access-and-licensing](https://docs.soda.io/reference/documentation-access-and-licensing "mention").
{% endhint %}
{% endif %}

{% if !(visitor.claims.plan === 'enterprise' || visitor.claims.plan === 'enterpriseUserBased' || visitor.claims.plan === 'datasetStandard')%}
{% hint style="info" %}
You are **not logged in to Soda** and are viewing the default public documentation. Learn more about [documentation-access-and-licensing](https://docs.soda.io/reference/documentation-access-and-licensing "mention").
{% endhint %}
{% endif %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.soda.io/reference/data-source-reference-for-soda-core/bigquery.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
