DuckDB
Access configuration details to connect Soda to a DuckDB data source.
Soda supports DuckDB as a flexible, lightweight SQL engine that can be used with native .duckdb files, in-memory data, or external dataframes such as Pandas and Polars.
Connection configuration reference
Install the following package:
pip install -i https://pypi.cloud.soda.io/simple --pre -U "soda-duckdb>4"Data source YAML
type: duckdb
name: my_duckdb
connection:
database: "adventureworks.duckdb" # or a supported file path like "dim_employee.parquet"Contract YAML
dataset: datasource/main/adventureworks
columns:
- name: id
checks:
- missing:
- name: name
checks:
- missing:
threshold:
metric: percent
must_be_less_than: 10
- name: size
checks:
- invalid:
valid_values: ['S', 'M', 'L']
checks:
- schema:
- row_count:DuckDB also supports registering in-memory data frames from Pandas or Polars, and creating temporary tables for contract testing. You can run Soda contracts against these datasets by passing the live DuckDB cursor to DuckDBDataSource.from_existing_cursor as described in the following page:
Learn more: DuckDB advanced usage
Connecting to MotherDuck
You can also connect Soda to MotherDuck using the same DuckDB package. MotherDuck is a managed cloud service for DuckDB that provides persistent storage and database sharing while preserving DuckDB’s execution model. To connect, use the md: connection string and provide a MotherDuck service token via an environment variable.
Soda uses DuckDB’s native MotherDuck integration, so no additional drivers or configuration are required. The specified database is created automatically if it does not already exist. Ensure the MDTOKEN environment variable is set before running Soda.
Last updated
Was this helpful?
