# DuckDB

Soda supports DuckDB as a flexible, lightweight SQL engine that can be used with native `.duckdb` files, in-memory data, or external dataframes such as Pandas and Polars.

### Connection configuration reference

Install the following package:

```bash
pip install soda-duckdb
```

#### Data source YAML

**Create the config file:**

```shellscript
soda data-source create -f ds_config.yml
```

The data source configuration YAML should look like the following:

{% code title="ds\_config.yml" %}

```yaml
type: duckdb 
name: my_duckdb
connection: 
    database: "adventureworks.duckdb" # or a supported file path like "dim_employee.parquet"
```

{% endcode %}

#### Contract YAML

{% code title="contract.yml" %}

```yaml
dataset: datasource/main/adventureworks

columns:
  - name: id
    checks:
      - missing:
  - name: name
    checks:
      - missing:
          threshold:
            metric: percent
            must_be_less_than: 10
  - name: size
    checks:
      - invalid:
          valid_values: ['S', 'M', 'L']

checks:
  - schema:
  - row_count:
```

{% endcode %}

***

DuckDB also supports registering **in-memory data frames** from **Pandas** or **Polars,** and creating **temporary tables for contract testing**. You can run Soda contracts against these datasets by passing the live DuckDB cursor to `DuckDBDataSource.from_existing_cursor` as described in the following page:

> Learn more: [duckdb-advanced-usage](https://docs.soda.io/reference/data-source-reference-for-soda-core/duckdb/duckdb-advanced-usage "mention")

### Connecting to MotherDuck

You can also connect Soda to **MotherDuck** using the same DuckDB package. MotherDuck is a managed cloud service for DuckDB that provides persistent storage and database sharing while preserving DuckDB’s execution model. To connect, use the `md:` connection string and provide a MotherDuck service token via an environment variable.

```yaml
name: my_duckdb
connection:
  database: "md:my_db?motherduck_token=${env.MDTOKEN}"
```

Soda uses DuckDB’s native MotherDuck integration, so no additional drivers or configuration are required. The specified database is created automatically if it does not already exist. Ensure the `MDTOKEN` environment variable is set before running Soda.

<br>

***

{% if (visitor.claims.plan === 'datasetStandard')%}
{% hint style="success" %}
You are **logged in to Soda** and seeing the **Free license** documentation. Learn more about [documentation-access-and-licensing](https://docs.soda.io/reference/documentation-access-and-licensing "mention").
{% endhint %}
{% endif %}

{% if (visitor.claims.plan === 'enterprise')%}
{% hint style="success" %}
You are **logged in to Soda** and seeing the **Team license** documentation. Learn more about [documentation-access-and-licensing](https://docs.soda.io/reference/documentation-access-and-licensing "mention").
{% endhint %}
{% endif %}

{% if (visitor.claims.plan === 'enterpriseUserBased')%}
{% hint style="success" %}
You are **logged in to Soda** and seeing the **Enterprise license** documentation. Learn more about [documentation-access-and-licensing](https://docs.soda.io/reference/documentation-access-and-licensing "mention").
{% endhint %}
{% endif %}

{% if !(visitor.claims.plan === 'enterprise' || visitor.claims.plan === 'enterpriseUserBased' || visitor.claims.plan === 'datasetStandard')%}
{% hint style="info" %}
You are **not logged in to Soda** and are viewing the default public documentation. Learn more about [documentation-access-and-licensing](https://docs.soda.io/reference/documentation-access-and-licensing "mention").
{% endhint %}
{% endif %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.soda.io/reference/data-source-reference-for-soda-core/duckdb.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
