DuckDB
Access configuration details to connect Soda to a DuckDB data source.
Soda supports DuckDB as a flexible, lightweight SQL engine that can be used with native .duckdb
files, in-memory data, or external dataframes such as Pandas and Polars.
Connection configuration reference
Install the following package:
pip install -i https://pypi.dev.sodadata.io/simple -U soda-duckdb
Data source YAML
type: duckdb
name: my_duckdb
connection:
database: "adventureworks.duckdb" # or a supported file path like "dim_employee.parquet"
Contract YAML
dataset: datasource/main/adventureworks
columns:
- name: id
checks:
- missing:
- name: name
checks:
- missing:
threshold:
metric: percent
must_be_less_than: 10
- name: size
checks:
- invalid:
valid_values: ['S', 'M', 'L']
checks:
- schema:
- row_count:
DuckDB supports registering in-memory dataframes from Pandas or Polars and creating temporary tables for contract testing. You can run Soda contracts against these datasets by passing the live DuckDB cursor to DuckDBDataSource.from_existing_cursor
.
Learn more: DuckDB advanced usage
Last updated
Was this helpful?