DuckDB advanced usage
Soda supports DuckDB as a flexible, lightweight SQL engine that can be used with files and in-memory data frames such as Pandas and Polars.
Install the following package:
pip install -i https://pypi.dev.sodadata.io/simple -U soda-duckdb
From Pandas DataFrame
import pandas as pd
import duckdb
from soda_core.contracts import verify_contracts_locally
from soda_duckdb import DuckDBDataSource
df = pd.read_parquet("adventureworks.parquet")
conn = duckdb.connect(database=":memory:")
cursor = conn.cursor()
cursor.register(view_name="adventureworks", python_object=df)
result = verify_contracts_locally(
data_sources=[DuckDBDataSource.from_existing_cursor(cursor, name="duckdb")],
contract_file_paths=["adventureworks.yml"]
)
From Polars DataFrame
import polars as pl
import duckdb
from soda_core.contracts import verify_contracts_locally
from soda_duckdb import DuckDBDataSource
df = pl.read_parquet("adventureworks.parquet")
conn = duckdb.connect(database=":memory:")
cursor = conn.cursor()
cursor.register(view_name="adventureworks", python_object=df)
result = verify_contracts_locally(
data_sources=[DuckDBDataSource.from_existing_cursor(cursor, name="duckdb")],
contract_file_paths=["adventureworks.yml"]
)
In-Memory with DuckDB SQL
import duckdb
from soda_core.contracts import verify_contracts_locally
from soda_duckdb import DuckDBDataSource
db_connection = duckdb.connect(database=":memory:")
cursor = db_connection.cursor()
cursor.execute("CREATE SCHEMA analytics")
cursor.execute("CREATE TABLE analytics.adventureworks AS SELECT * FROM read_parquet('adventureworks.parquet')")
result = verify_contracts_locally(
data_sources=[DuckDBDataSource.from_existing_cursor(cursor, name="duckdb")],
contract_file_paths=["adventureworks.yml"]
)
Data from Parquet File
You can point directly to a .parquet
file as a DuckDB "database":
type: duckdb
name: duckdb
connection:
database: "adventureworks.parquet"
Then you can verify a contract on this database using the CLI:
soda contract verify -ds ds.yml -c adventureworks.yml
Or Python API:
from soda_core.contracts import verify_contracts_locally
result = verify_contracts_locally(
data_source_file_path="ds.yml",
contract_file_paths=["adventureworks.yml"]
)
Last updated
Was this helpful?