Spark Dataframe
Access configuration details to connect Soda to a Spark Dataframe data source.
Connection configuration reference
pip install -i https://pypi.cloud.soda.io/simple --pre -U "soda-sparkdf>4"From existing spark session
from pyspark.sql import SparkSession
from soda_core.contracts import verify_contract_locally
from soda_sparkdf import SparkDataFrameDataSource
spark = (
SparkSession.builder.master("local[*]")
.appName("soda_sparkdf")
.getOrCreate()
)
# Create a database (schema) for organization
spark.sql("CREATE DATABASE IF NOT EXISTS my_schema")
spark.sql("USE my_schema")
# Create the DataFrame and save it as a table in the schema
df = spark.createDataFrame([(1,), (2,), (3,)], ["id"])
df.write.mode("overwrite").saveAsTable("my_table")
spark_data_source = SparkDataFrameDataSource.from_existing_session(
session=spark,
name="my_sparkdf"
)
result = verify_contract_locally(
data_sources=[spark_data_source],
contract_file_path="./my_table.yaml",
soda_cloud_file_path="../soda-cloud.yaml",
publish=True
)
if result.is_ok:
print("✅ Contract verification passed.")
else:
print("❌ Contract verification failed:")
print(result.get_errors_str())Example contract
Example contract: Spark - Databricks
Databricks Connect compatibility
Troubleshoot
Last updated
Was this helpful?
