# Sigma

Connect Soda Cloud to Sigma to build live reporting dashboards on top of your data quality results. You can create dashboards that visualize Soda check results, track failed rows over time, and report on the overall health of your data assets.

<figure><img src="https://1123167021-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FA2PmHkO5cBgeRPdiPPOG%2Fuploads%2F0cohc1LIf0tinRLHNkIh%2Fimage.png?alt=media&#x26;token=77e2c4b8-f836-48c4-a03c-fbd5060ff53d" alt=""><figcaption></figcaption></figure>

The integration works by exposing Soda data, such as check results, scan time, or failed rows. Once Sigma is connected to Soda or a Soda data source, such as a Diagnostics Warehouse, you can build and share dashboards directly in Sigma without any additional export or scripting steps.

You can use the Soda Cloud API to extract check and dataset results programmatically and write them to a data warehouse of your choice, which Sigma can then query.

### Prerequisites

* **Python 3.8+**
* **Pip 21.0+**
* **A Sigma account** with permission to create workbooks and connect data sources
* **A Soda Cloud account** with at least one data source configured and scans running
  * Permission in Soda Cloud to access dataset metadata; see [global-and-dataset-roles](https://docs.soda.io/organization-and-admin-settings/global-and-dataset-roles "mention")
  * Soda Cloud API keys. See [generate-api-keys](https://docs.soda.io/reference/generate-api-keys "mention")

***

## Soda Cloud Reporting API dashboard

### Set up a Python script

{% stepper %}
{% step %}
Install the required dependencies:

```bash
pip install pandas requests
pip install "snowflake-connector-python[pandas]"
```

{% endstep %}

{% step %}
In a new Python script, configure your Soda Cloud connection. Use `cloud.us.soda.io` for US accounts and `cloud.soda.io` for EU accounts.

> See [Generate API keys](https://docs.soda.io/reference/generate-api-keys) for instructions on obtaining your keys.

```python
soda_cloud_url = 'https://cloud.us.soda.io'
soda_apikey = 'xxx'        # API key ID from Soda Cloud
soda_apikey_secret = 'xxx' # API key secret from Soda Cloud
```

{% endstep %}

{% step %}
Define the Snowflake table names in which to store the metadata. Use uppercase names to meet Snowflake's case sensitivity requirements.

```python
datasets_table = 'DATASETS_REPORT'
checks_table   = 'CHECKS_REPORT'
```

{% endstep %}

{% step %}
Configure your Snowflake connection:

```python
snowflake_details = snowflake.connector.connect(
    user=user,
    password=password,
    account=account,
    warehouse=warehouse,
    database=database,
    schema=schema,
)
```

{% endstep %}
{% endstepper %}

### Capture and store metadata

{% stepper %}
{% step %}
Make a GET request to the Soda Cloud API to retrieve dataset information. The script exits with an error if the request is unauthorised.

```python
response_datasets = requests.get(
    soda_cloud_url + '/api/v1/datasets?page=0',
    auth=(soda_apikey, soda_apikey_secret)
)
if response_datasets.status_code in (401, 403):
    print("Unauthorized or Forbidden. Check your API keys and permissions.")
    sys.exit()
```

{% endstep %}

{% step %}
Once the connection is confirmed, iterate over all pages of results and load them into a Pandas DataFrame. The script handles API rate limiting automatically. On HTTP 429, it pauses 30 seconds and retries the same page.

```python
if response_datasets.status_code == 200:
    dataset_pages = response_datasets.json().get('totalPages')
    i = 0
    while i < dataset_pages:
        dq_datasets = requests.get(
            soda_cloud_url + '/api/v1/datasets?page=' + str(i),
            auth=(soda_apikey, soda_apikey_secret))
        if dq_datasets.status_code == 200:
            datasets.extend(dq_datasets.json().get("content"))
            i += 1
        elif dq_datasets.status_code == 429:
            print("Rate limit reached. Pausing 30 seconds.")
            time.sleep(30)
df_datasets = pd.DataFrame(datasets)
```

{% endstep %}

{% step %}
Following the same pattern, extract all check information from the Checks endpoint. This retrieves each check's name, dataset, last evaluation time, result (pass/warn/fail), owner, and any custom attributes.

```python
response_checks = requests.get(
    soda_cloud_url + '/api/v1/checks?size=100',
    auth=(soda_apikey, soda_apikey_secret))

if response_checks.status_code == 200:
    check_pages = response_checks.json().get('totalPages')
    i = 0
    while i < check_pages:
        dq_checks = requests.get(
            soda_cloud_url + '/api/v1/checks?size=100&page=' + str(i),
            auth=(soda_apikey, soda_apikey_secret))
        if dq_checks.status_code == 200:
            checks.extend(dq_checks.json().get("content"))
            i += 1
        elif dq_checks.status_code == 429:
            time.sleep(30)
df_checks = pd.DataFrame(checks)
```

Any custom Soda attributes defined on your checks are automatically expanded into individual columns. If the target tables already exist in Snowflake, the script detects new attribute columns and adds them with `ALTER TABLE` without overwriting existing data.
{% endstep %}

{% step %}
Write both DataFrames to your data source:

```python
write_pandas(snowflake_details, df_checks,   checks_table,   auto_create_table=True)
write_pandas(snowflake_details, df_datasets, datasets_table, auto_create_table=True)
```

{% endstep %}

{% step %}
To track changes over time, schedule the script to run regularly and store results in incremental tables.
{% endstep %}
{% endstepper %}

### Build a dashboard in Sigma

{% stepper %}
{% step %}
Follow Sigma's documentation to [**connect to your data source**](https://help.sigmacomputing.com/docs/connect-to-data-sources), pointing to the database and schema where your `CHECKS_REPORT` and `DATASETS_REPORT` tables are stored.
{% endstep %}

{% step %}
Access the metadata either by [modelling data from the database tables](https://help.sigmacomputing.com/docs/create-models) directly, or by [creating a dataset using custom SQL](https://help.sigmacomputing.com/docs/create-a-dataset-from-sql#create-a-dataset-by-writing-custom-sql).
{% endstep %}

{% step %}
Create a new workbook in Sigma and add your visualisations.
{% endstep %}
{% endstepper %}

A typical data quality dashboard might include:

* **KPI tiles** — total datasets monitored, total checks executed, number of failing checks
* **Weighted data quality score** — using the `Weight` check attribute to calculate a custom health score, trended over time
* **Breakdowns by check attribute** — such as Data Quality Dimension (Completeness, Validity, Consistency, Accuracy, Timeliness, Uniqueness), Data Domain, Data Team, or Pipeline Stage

<figure><img src="https://1123167021-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FA2PmHkO5cBgeRPdiPPOG%2Fuploads%2FzWtsA1WGcaU9AZKYv0Ba%2Fimage.png?alt=media&#x26;token=aae4aa23-6a67-4dd2-86bb-9f6f2e149342" alt=""><figcaption><p>Sigma dashboard on top of Soda Cloud Reporting API results</p></figcaption></figure>

{% hint style="info" %} <i class="fa-lightbulb">:lightbulb:</i> **Check attributes** such as Data Quality Dimension, Data Domain, Data Team, and Pipeline Stage are particularly useful for filtering and segmenting results in your dashboard. The `Weight` attribute lets you assign a numerical importance level to each check, enabling a custom data health score.

See [check and dataset attributes](https://docs.soda.io/manage-issues/check-and-dataset-attributes) for more.
{% endhint %}

{% if visitor.claims.plan ===  %}

***

### Diagnostics Warehouse dashboard

If your organisation has the Diagnostics Warehouse enabled, you can build Sigma dashboards with full failed row data. The Diagnostics Warehouse stores the complete set of failed rows produced by each check, enabling row-level dashboards that show exactly which records failed and why.

<figure><img src="https://1123167021-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FA2PmHkO5cBgeRPdiPPOG%2Fuploads%2F0cohc1LIf0tinRLHNkIh%2Fimage.png?alt=media&#x26;token=77e2c4b8-f836-48c4-a03c-fbd5060ff53d" alt=""><figcaption><p>Sigma dashboard built on top of Diagnostics Warehouse results</p></figcaption></figure>

> See [Diagnostics Warehouse](https://docs.soda.io/integrations/broken-reference) for setup instructions and the full data model.

**Prerequisites**

In addition to the prerequisites for the Reporting API dashboard, you need:

* Diagnostics Warehouse configured on your datasets. See [Configure Diagnostics Warehouse](https://docs.soda.io/integrations/broken-reference)
* Read access to the Diagnostics Warehouse schema in your data source
* The Diagnostics Warehouse schema name (set during configuration)

#### Connect Sigma to your Diagnostics Warehouse schema

If your organisation doesn't already have a Sigma connection to the data source where Diagnostics Warehouse is configured, ask your Sigma admin to create one. **The connection needs read access to the Diagnostics Warehouse schema**.

Once the connection exists:

1. Open Sigma and click **Create new → Workbook**.
2. In the workbook, click **Data** and browse to the Diagnostics Warehouse schema. You should see the following tables:
   * `scans` — one row per scan execution
   * `check_results` — one row per check per scan
   * `dataset_metadata` — dataset identity and names
   * `check_attributes` — key-value check attributes (e.g. Data Domain, Pipeline Stage)
   * `dataset_attributes` — key-value dataset attributes
   * `fr_<dataset_id>_<vXXXX>` — full failed rows per dataset (one table per monitored dataset)
   * `fk_<dataset_id>_<vXXXX>` — failed keys per dataset

**Model the data in Sigma**

Suggested steps to build a dashboard with Diagnostics Warehouse data:

* Add the `check_results` view to your workbook as your primary table. It contains check outcome, metric value, failed row count, and references to the failed rows tables.
* Join `scans` to `check_results` on `scan_id` to bring in `execution_time` and `data_time` for time-based filtering.
* Join `dataset_metadata` to `check_results` on `dataset_id` to resolve human-readable dataset names.
* To include check attributes (e.g. Data Quality Dimension, Data Domain), join `check_attributes` to `check_results` on `check_id`. Because attributes are stored as key-value rows, pivot or filter by `key` to get each attribute as a column.
* To show failed rows for a specific dataset, add the relevant `fr_<dataset_id>` view. Join it to `check_results` using `__soda_check_id` and `__soda_scan_id` to link each failed row to its check result and scan context.

**Build the workbook**

Create your workbook elements. Typical elements for a DDWH-based dashboard include:

* **Failed row count over time** — line chart using `check_results.failed_rows_count` grouped by `scans.execution_time`
* **Failed rows table** — detailed table from `fr_<dataset_id>` showing the actual column values of each failing record
* **Check outcome breakdown** — bar or KPI using `check_results.outcome` (pass / warn / fail / not\_evaluated)
* **Filter by attribute** — control filters on `check_attributes` values such as Data Domain or Pipeline Stage to scope the view

**Keep dashboard queries stable**

Because `fr_` and `fk_` tables are versioned (`_v0000`, `_v0001`, etc.) when the source schema changes, always build your Sigma workbook against the **views** Soda creates alongside each table. The views use consistent names without version suffixes and automatically point to the latest version.
{% endif %}

<br>

***

{% if visitor.claims.plan ===  %}
{% hint style="success" %}
You are **logged in to Soda** and seeing the **Free license** documentation. Learn more about [documentation-access-and-licensing](https://docs.soda.io/reference/documentation-access-and-licensing "mention").
{% endhint %}
{% endif %}

{% if visitor.claims.plan ===  %}
{% hint style="success" %}
You are **logged in to Soda** and seeing the **Team license** documentation. Learn more about [documentation-access-and-licensing](https://docs.soda.io/reference/documentation-access-and-licensing "mention").
{% endhint %}
{% endif %}

{% if visitor.claims.plan ===  %}
{% hint style="success" %}
You are **logged in to Soda** and seeing the **Enterprise license** documentation. Learn more about [documentation-access-and-licensing](https://docs.soda.io/reference/documentation-access-and-licensing "mention").
{% endhint %}
{% endif %}

{% if !(visitor.claims.plan ===  %}
{% hint style="info" %}
You are **not logged in to Soda** and are viewing the default public documentation. Learn more about [documentation-access-and-licensing](https://docs.soda.io/reference/documentation-access-and-licensing "mention").
{% endhint %}
{% endif %}
