DuckDB advanced usage
Soda supports DuckDB as a flexible, lightweight SQL engine that can be used with files and in-memory data frames such as Pandas and Polars.
Install the following package:
pip install -i https://pypi.dev.sodadata.io/simple -U soda-duckdbFrom Pandas DataFrame
import pandas as pd
import duckdb
from soda_core.contracts import verify_contracts_locally
from soda_duckdb import DuckDBDataSource
df = pd.read_parquet("adventureworks.parquet")
conn = duckdb.connect(database=":memory:")
cursor = conn.cursor()
cursor.register(view_name="adventureworks", python_object=df)
result = verify_contracts_locally(
data_sources=[DuckDBDataSource.from_existing_cursor(cursor, name="duckdb")],
contract_file_paths=["adventureworks.yml"]
)From Polars DataFrame
import polars as pl
import duckdb
from soda_core.contracts import verify_contracts_locally
from soda_duckdb import DuckDBDataSource
df = pl.read_parquet("adventureworks.parquet")
conn = duckdb.connect(database=":memory:")
cursor = conn.cursor()
cursor.register(view_name="adventureworks", python_object=df)
result = verify_contracts_locally(
data_sources=[DuckDBDataSource.from_existing_cursor(cursor, name="duckdb")],
contract_file_paths=["adventureworks.yml"]
)In-Memory with DuckDB SQL
import duckdb
from soda_core.contracts import verify_contracts_locally
from soda_duckdb import DuckDBDataSource
db_connection = duckdb.connect(database=":memory:")
cursor = db_connection.cursor()
cursor.execute("CREATE SCHEMA analytics")
cursor.execute("CREATE TABLE analytics.adventureworks AS SELECT * FROM read_parquet('adventureworks.parquet')")
result = verify_contracts_locally(
data_sources=[DuckDBDataSource.from_existing_cursor(cursor, name="duckdb")],
contract_file_paths=["adventureworks.yml"]
)Data from Parquet File
You can point directly to a .parquet file as a DuckDB "database":
type: duckdb
name: duckdb
connection:
database: "adventureworks.parquet"Then you can verify a contract on this database using the CLI:
soda contract verify -ds ds.yml -c adventureworks.ymlOr Python API:
from soda_core.contracts import verify_contracts_locally
result = verify_contracts_locally(
data_source_file_path="ds.yml",
contract_file_paths=["adventureworks.yml"]
)Last updated
Was this helpful?
