Synapse
Access configuration details to connect Soda to an Azure Synapse Analytics data source.
Install the following package:
pip install -i https://pypi.dev.sodadata.io/simple -U soda-synapse
Data source YAML
type: synapse
name: my_synapse
connection:
host: <your-server>
port: 1433
database: ${env.SYNAPSE_DB}
username: ${env.SYNAPSE_USER} # SEE NOTE
password: ${env.SYNAPSE_PW} # SEE NOTE
authentication: sql # activedirectoryserviceprincipal | activedirectoryinteractive | activedirectorypassword
# optional
client_id: ${env.SYNAPSE_SERVICE_CLIENT_ID} # SEE NOTE
client_secret: ${env.SYNAPSE_SERVICE_CLIENT_SECRET} # SEE NOTE
driver: ODBC Driver 18 for SQL Server
trusted_connection: false
encrypt: false
trust_server_certificate: false
Connection test
Test the data source connection:
soda data-source test -ds ds.yml
Regex patterns in Synapse
Synapse databases do not offer universal regex support. When using regex-based checks in these environments, Soda adapts pattern-matching queries.
Instead of regex functions, Soda translates patterns into SQL queries using the PATINDEX
function.
How Soda translates regex patterns
If the user specifies a regex pattern
which matches a column expression
, it will be expanded in SQL as:
PATINDEX('%{pattern}%', {column expression} COLLATE SQL_Latin1_General_Cp1_CS_AS) > 0
Key considerations
Collation
A collation (a rule set used for determining string matches) must be specified.
Soda uses the SQL_Latin1_General_Cp1_CS_AS
, which is case-sensitive and accent-sensitive, so a
and A
are not equal, and accented characters (such as é
) are treated distinctly.
Pattern handling
Regex patterns are wrapped in %…%
, so any substring match within the column value will return true
.
Example: ^abc
becomes %abc%
, which will match anywhere in the string.
Character ranges In SQL Server and related databases, like Synapse, character range expansion for alpha characters is handled differently than in most regex engines.
Soda auto-expands the most common ranges for you:
[a-z]
→[abcdefghijklmnopqrstuvwxyz]
[A-Z]
→[ABCDEFGHIJKLMNOPQRSTUVWXYZ]
Other ranges are not expanded automatically and you may need to manually expand the full set when defining your pattern.
If you need
[0-9]
,[α-ω]
, or similar, you may have to manually expand them.Example: instead of
[0-9]
, write[0123456789]
.
Last updated
Was this helpful?