# Synapse

### Requirements

* Install the [Microsoft ODBC driver for SQL Server](https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/install-microsoft-odbc-driver-sql-server-macos?view=sql-server-ver17) before proceeding with the connection configuration.

### Connection configuration reference

Install the following package:

```bash
pip install soda-synapse
```

#### Data source YAML

**Create the config file:**

```shellscript
soda data-source create -f ds_config.yml
```

The data source configuration YAML should look like the following:

{% code title="ds\_config.yml" %}

```yaml
type: synapse
name: my_synapse
connection:
  host: <your-server>
  port: 1433
  database: ${env.SYNAPSE_DB}
  username: ${env.SYNAPSE_USER}  # SEE NOTE
  password: ${env.SYNAPSE_PW}  # SEE NOTE
  authentication: sql  # activedirectoryserviceprincipal | activedirectoryinteractive | activedirectorypassword 
  # optional
  client_id: ${env.SYNAPSE_SERVICE_CLIENT_ID} # SEE NOTE
  client_secret: ${env.SYNAPSE_SERVICE_CLIENT_SECRET} # SEE NOTE
  driver: ODBC Driver 18 for SQL Server
  trusted_connection: false
  encrypt: false
  trust_server_certificate: false
```

{% endcode %}

{% hint style="info" %}
**Note:** depending on the authentication method that is used, `user` and `password` may not be required (e.g. `activedirectoryserviceprincipal` requires `client_id` and `client_secret`).
{% endhint %}

#### Connection test

Test the data source connection:

```bash
soda data-source test -ds ds_config.yml
```

## Limitations & edge cases

### Regex patterns in Synapse

Synapse databases **do not offer universal regex support**. When using regex-based checks in these environments, Soda adapts pattern-matching queries.

Instead of regex functions, **Soda translates patterns into SQL queries using the `PATINDEX` function**.

#### How Soda translates regex patterns

If the user specifies a regex `pattern` which matches a `column expression`, it will be expanded in SQL as:

```sql
PATINDEX('%{pattern}%', {column expression} COLLATE SQL_Latin1_General_Cp1_CS_AS) > 0
```

**Key considerations**

* **Collation**

A collation (a rule set used for determining string matches) must be specified.

Soda uses the `SQL_Latin1_General_Cp1_CS_AS`, which is **case-sensitive** and **accent-sensitive**, so `a` and `A` are not equal, and accented characters (such as `é`) are treated distinctly.

* **Pattern handling**

Regex patterns are wrapped in `%…%`, so any substring match within the column value will return `true`.

Example: `^abc` becomes `%abc%`, which will match anywhere in the string.

* **Character ranges**\
  In SQL Server and related databases, like Synapse, character range expansion for alpha characters is handled differently than in most regex engines.
  * Soda **auto-expands** the most common ranges for you:
    * `[a-z]` → `[abcdefghijklmnopqrstuvwxyz]`
    * `[A-Z]` → `[ABCDEFGHIJKLMNOPQRSTUVWXYZ]`
  * **Other ranges are not expanded automatically** and you may need to manually expand the full set when defining your pattern.
    * If you need `[0-9]`, `[α-ω]`, or similar, you may have to manually expand them.
    * Example: instead of `[0-9]`, write `[0123456789]`.

### Storage structure

Soda uses a **Heap** table storage structure in Synapse. Unlike indexed structures, **Heap stores rows without predefined order or compression**, which is required for compatibility with certain data types used in the warehouse (VARCHAR(MAX), TEXT, and XML) without character limits.

As a side effect, **tables in the diagnostics warehouse may consume more storage** than equivalent tables using columnar compression. This is expected behavior and a known trade-off of the current architecture.

<br>

***

{% if (visitor.claims.plan === 'datasetStandard')%}
{% hint style="success" %}
You are **logged in to Soda** and seeing the **Free license** documentation. Learn more about [documentation-access-and-licensing](https://docs.soda.io/reference/documentation-access-and-licensing "mention").
{% endhint %}
{% endif %}

{% if (visitor.claims.plan === 'enterprise')%}
{% hint style="success" %}
You are **logged in to Soda** and seeing the **Team license** documentation. Learn more about [documentation-access-and-licensing](https://docs.soda.io/reference/documentation-access-and-licensing "mention").
{% endhint %}
{% endif %}

{% if (visitor.claims.plan === 'enterpriseUserBased')%}
{% hint style="success" %}
You are **logged in to Soda** and seeing the **Enterprise license** documentation. Learn more about [documentation-access-and-licensing](https://docs.soda.io/reference/documentation-access-and-licensing "mention").
{% endhint %}
{% endif %}

{% if !(visitor.claims.plan === 'enterprise' || visitor.claims.plan === 'enterpriseUserBased' || visitor.claims.plan === 'datasetStandard')%}
{% hint style="info" %}
You are **not logged in to Soda** and are viewing the default public documentation. Learn more about [documentation-access-and-licensing](https://docs.soda.io/reference/documentation-access-and-licensing "mention").
{% endhint %}
{% endif %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.soda.io/reference/data-source-reference-for-soda-core/synapse.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
