DuckDB advanced usage

Soda supports DuckDB as a flexible, lightweight SQL engine that can be used with files and in-memory data frames such as Pandas and Polars.

Install the following package:

pip install -i https://pypi.dev.sodadata.io/simple -U soda-duckdb

From Pandas DataFrame

import pandas as pd
import duckdb
from soda_core.contracts import verify_contract_locally
from soda_duckdb import DuckDBDataSource

df = pd.read_parquet("adventureworks.parquet")
conn = duckdb.connect(database=":memory:")
cursor = conn.cursor()
cursor.register(view_name="adventureworks", python_object=df)

result = verify_contract_locally(
    data_sources=[DuckDBDataSource.from_existing_cursor(cursor, name="duckdb")],
    contract_file_path="adventureworks.yml"
)

From Polars DataFrame


In-Memory with DuckDB SQL


Data from Parquet File

You can point directly to a .parquet file as a DuckDB "database":

Then you can verify a contract on this database using the CLI:

Or Python API:


You are not logged in to Soda and are viewing the default public documentation. Learn more about Documentation access & licensing.

Last updated

Was this helpful?