Synapse

Access configuration details to connect Soda to an Azure Synapse Analytics data source.

Requirements

Connection configuration reference

Install the following package:

pip install soda-synapse

Data source YAML

Create the config file:

soda data-source create -f ds_config.yml

The data source configuration YAML should look like the following:

ds_config.yml
type: synapse
name: my_synapse
connection:
  host: <your-server>
  port: 1433
  database: ${env.SYNAPSE_DB}
  username: ${env.SYNAPSE_USER}  # SEE NOTE
  password: ${env.SYNAPSE_PW}  # SEE NOTE
  authentication: sql  # activedirectoryserviceprincipal | activedirectoryinteractive | activedirectorypassword 
  # optional
  client_id: ${env.SYNAPSE_SERVICE_CLIENT_ID} # SEE NOTE
  client_secret: ${env.SYNAPSE_SERVICE_CLIENT_SECRET} # SEE NOTE
  driver: ODBC Driver 18 for SQL Server
  trusted_connection: false
  encrypt: false
  trust_server_certificate: false
circle-info

Note: depending on the authentication method that is used, user and password may not be required (e.g. activedirectoryserviceprincipal requires client_id and client_secret).

Connection test

Test the data source connection:

Limitations & edge cases

Regex patterns in Synapse

Synapse databases do not offer universal regex support. When using regex-based checks in these environments, Soda adapts pattern-matching queries.

Instead of regex functions, Soda translates patterns into SQL queries using the PATINDEX function.

How Soda translates regex patterns

If the user specifies a regex pattern which matches a column expression, it will be expanded in SQL as:

Key considerations

  • Collation

A collation (a rule set used for determining string matches) must be specified.

Soda uses the SQL_Latin1_General_Cp1_CS_AS, which is case-sensitive and accent-sensitive, so a and A are not equal, and accented characters (such as é) are treated distinctly.

  • Pattern handling

Regex patterns are wrapped in %…%, so any substring match within the column value will return true.

Example: ^abc becomes %abc%, which will match anywhere in the string.

  • Character ranges In SQL Server and related databases, like Synapse, character range expansion for alpha characters is handled differently than in most regex engines.

    • Soda auto-expands the most common ranges for you:

      • [a-z][abcdefghijklmnopqrstuvwxyz]

      • [A-Z][ABCDEFGHIJKLMNOPQRSTUVWXYZ]

    • Other ranges are not expanded automatically and you may need to manually expand the full set when defining your pattern.

      • If you need [0-9], [α-ω], or similar, you may have to manually expand them.

      • Example: instead of [0-9], write [0123456789].

Storage structure

Soda uses a Heap table storage structure in Synapse. Unlike indexed structures, Heap stores rows without predefined order or compression, which is required for compatibility with certain data types used in the warehouse (VARCHAR(MAX), TEXT, and XML) without character limits.

As a side effect, tables in the diagnostics warehouse may consume more storage than equivalent tables using columnar compression. This is expected behavior and a known trade-off of the current architecture.


circle-info

You are not logged in to Soda and are viewing the default public documentation. Learn more about Documentation access & licensing.

Last updated

Was this helpful?