Additional settings

Test a contract on a sample

When testing a data contract, Soda allows you to run contract validation on a sample of your dataset instead of the full data. This feature helps you quickly and cost-efficiently verify that your contract runs correctly before executing full scans.

Running a test contract on a sample enables you to:

  • Validate that your contract syntax, checks, and filters work as expected.

  • Reduce data warehouse compute cost while verifying new or updated contracts.

  • Iterate faster on contract definitions in development environments.

Results from sampled runs reflect only a subset of your data and may not represent its actual quality. Use full verification once your contract logic is validated.

Enable sampling for test contracts

This feature can be enabled at the data source level, applying to all datasets that use that connection.

To enable this feature:

  1. Go to Data sources.

  2. Click Edit connection for a data source.

  1. Under the Connection Details section, toggle Data Sampling.

  2. Specify your sample size on the Limit field.

  1. Click Connect.


Optimize computing with multiple warehouses

When connecting to Snowflake, you must provide a warehouse as part of the data source configuration. By default, this single warehouse is used for all operations, including discovery, metric monitoring, profiling, data contract executions, and the diagnostics warehouse.

The Configure warehouses per dataset feature gives you greater control and flexibility by allowing you to define specific warehouses for individual datasets. This helps you optimize cost, manage compute workloads, and allocate resources efficiently across your data operations.

This feature is available only when using Soda Agent. When using Soda Core, the warehouse can be specified directly in the connection YAML instead.

Enable the use of multiple warehouses

  1. Go to Data sources in Soda Cloud.

  2. Click Edit connection for your Snowflake data source.

  3. Toggle on Configure Warehouses.

  4. Specify the list of allowed warehouses that can be used by this connection.

  5. Choose a default warehouse to use for all datasets unless otherwise specified.

  6. Click Save on the top right to save your configuration.

Default warehouse behavior

Once enabled:

  • The warehouse specified in the data source connection is used for discovery.

  • The default warehouse (defined under Configure Warehouses) is used for:

    • Metric monitoring

    • Profiling

    • Data contract executions

    • Diagnostics Warehouse operations

  • A different warehouse can be configured at the dataset level, overriding the default.

Specify a warehouse at the dataset level

  1. Go to a dataset in Soda Cloud.

  2. Click Edit dataset.

  1. Under the Snowflake section, select the warehouse to use for this dataset.

  2. Click Save to apply your changes.

Last updated

Was this helpful?