DuckDB

Access configuration details to connect Soda to a DuckDB data source.

Soda supports DuckDB as a flexible, lightweight SQL engine that can be used with native .duckdb files, in-memory data, or external dataframes such as Pandas and Polars.

Connection configuration reference

Install the following package:

pip install -i https://pypi.cloud.soda.io/simple --pre -U "soda-duckdb>4"

Data source YAML

type: duckdb 
name: my_duckdb
connection: 
    database: "adventureworks.duckdb" # or a supported file path like "dim_employee.parquet"

Contract YAML

dataset: datasource/main/adventureworks

columns:
  - name: id
    checks:
      - missing:
  - name: name
    checks:
      - missing:
          threshold:
            metric: percent
            must_be_less_than: 10
  - name: size
    checks:
      - invalid:
          valid_values: ['S', 'M', 'L']

checks:
  - schema:
  - row_count:

DuckDB also supports registering in-memory data frames from Pandas or Polars, and creating temporary tables for contract testing. You can run Soda contracts against these datasets by passing the live DuckDB cursor to DuckDBDataSource.from_existing_cursor as described in the following page:

Learn more: DuckDB advanced usage

Connecting to MotherDuck

You can also connect Soda to MotherDuck using the same DuckDB package. MotherDuck is a managed cloud service for DuckDB that provides persistent storage and database sharing while preserving DuckDB’s execution model. To connect, use the md: connection string and provide a MotherDuck service token via an environment variable.

Soda uses DuckDB’s native MotherDuck integration, so no additional drivers or configuration are required. The specified database is created automatically if it does not already exist. Ensure the MDTOKEN environment variable is set before running Soda.


circle-info

You are not logged in to Soda and are viewing the default public documentation. Learn more about Documentation access & licensing.

Last updated

Was this helpful?