Setup & configuration

This page provides detailed information about how to configure the Soda↔Collibra integration.

Both Collibra and Soda need to be configured so the integration can run successfully. This page covers both Collibra and Soda settings, including asset types, attribute types, relation types, and domain mappings. These settings establish the foundation for reliable synchronization of data quality checks and metadata between Soda and Collibra.


Configuration Guide

1. Collibra Configuration

Base Settings

collibra:
  base_url: "https://your-instance.collibra.com/rest/2.0"
  username: "your-username"
  password: "your-password"
  general:
    naming_delimiter: ">"  # Used to separate parts of asset names

Asset Types

Configure the different types of assets in Collibra:

  asset_types:
    table_asset_type: "00000000-0000-0000-0000-000000031007"  # ID for Table assets
    soda_check_asset_type: "00000000-0000-0000-0000-000000031107"  # ID for Data Quality Metric type
    dimension_asset_type: "00000000-0000-0000-0000-000000031108"  # ID for Data Quality Dimension type
    column_asset_type: "00000000-0000-0000-0000-000000031109"  # ID for Column type

Attribute Types

Define the attributes that will be set on check assets:

Diagnostic Attributes Behavior:

  • Flexible Extraction: Automatically extracts metrics from any diagnostic type (missing, aggregate, valid, etc.)

  • Future-Proof: Works with new diagnostic types that Soda may introduce

  • Smart Fallbacks: Falls back to datasetRowsTested if checkRowsTested is not available

  • Calculated Values: Automatically computes check_rows_passed and check_passing_fraction when source data is available

  • Graceful Handling: Leaves attributes empty when diagnostic data is not present in the check result

Relation Types

Define the types of relationships between assets:

Responsibilities

Configure ownership role mappings:

Domains

Configure the domains where assets will be created:

2. Soda Configuration

Base Settings

General Settings

Attributes

Define Soda attributes and their mappings:

Multiple dimensions support

The integration supports both single and multiple dimensions for data quality checks:

  • Single dimension: Specify as a string value (e.g., "Completeness")

  • Multiple dimensions: Use a comma-separated string (e.g., "Completeness, Consistency")

When multiple dimensions are provided as a comma-separated string, the integration will:

  1. Automatically split the string by commas and trim whitespace

  2. Search for each dimension asset in Collibra individually

  3. Create a relation for each dimension found

  4. Log a warning for any dimension that cannot be found in Collibra

  5. Continue processing even if some dimensions are missing

Example Configuration:

This will create three separate dimension relations in Collibra, one for each dimension specified.

Monitor Exclusion

The integration can exclude Soda monitors (items with metricType) from synchronization:

  • Enabled (sync_monitors: true): All checks and monitors are synchronized (default)

  • Disabled (sync_monitors: false): Only checks are synchronized, monitors are filtered out

When sync_monitors is disabled, the integration will:

  1. Filter out all items that have a metricType attribute

  2. Only process actual checks (items without metricType)

  3. Log the number of monitors filtered out for each dataset

  4. Continue processing with the remaining checks

This is useful when you want to focus on data quality checks and exclude monitoring metrics from your Collibra catalog.

Custom Attribute Syncing configuration

See the Custom Attribute Syncing section below for detailed instructions.


Custom Attribute Syncing

The integration supports syncing custom attributes from Soda checks to Collibra assets, allowing you to enrich your Collibra assets with business context and additional metadata from your data quality checks.

How Custom Attribute Syncing Works

Custom attribute syncing enables you to map specific attributes from your Soda checks to corresponding attribute types in Collibra. When a check is synchronized, the integration will automatically extract the values of these attributes and set them on the created/updated Collibra asset.

Configuration

To enable custom attribute syncing, add the custom_attributes_mapping_soda_attribute_name_to_collibra_attribute_type_id configuration to your config.yaml file:

The configuration value is a JSON string containing key-value pairs where:

  • Key: The name of the attribute in Soda (as it appears on your Soda checks)

  • Value: The UUID of the corresponding attribute type in Collibra

Step-by-Step Setup

1. Identify Soda Attributes

First, identify which attributes from your Soda checks you want to sync to Collibra. Common examples include:

  • description - Check description

  • business_impact - Business impact assessment

  • data_domain - Data domain classification

  • criticality - Data criticality level

  • owner_team - Owning team information

2. Find Collibra Attribute Type UUIDs

For each Soda attribute, find the corresponding attribute type UUID in Collibra:

  1. Navigate to your Collibra instance

  2. Go to SettingsMetamodelAttribute Types

  3. Find or create the attribute types you want to map to

  4. Copy the UUID of each attribute type

3. Create the JSON Mapping

Create a JSON object mapping Soda attribute names to Collibra attribute type UUIDs:

4. Add to Configuration

Add the JSON mapping to your config.yaml file as a single-line string:

Complete Example

Here's a complete example showing how to configure custom attribute syncing:

Soda Check with Custom Attributes:

Collibra Configuration:

Result: When this check is synchronized, the integration will create a Collibra asset with these attributes automatically set:

  • Description: "Ensures orders table is not empty"

  • Business Impact: "critical"

  • Data Domain: "sales"

  • Criticality: "high"

⚠️ Important Notes

  1. JSON Format: The mapping must be a valid JSON string enclosed in single quotes

  2. Attribute Type UUIDs: Use the exact UUIDs from your Collibra metamodel

  3. Case Sensitivity: Soda attribute names are case-sensitive and must match exactly

  4. Missing Attributes: If a Soda check doesn't have an attribute defined in the mapping, it will be skipped (no error)

  5. Invalid UUIDs: Invalid Collibra attribute type UUIDs will cause the sync to fail for that attribute

Troubleshooting

Common Issues:

  • Invalid JSON: Ensure the JSON string is properly formatted and enclosed in single quotes

  • Attribute Not Found: Verify the Soda attribute names match exactly what's defined in your checks

  • UUID Errors: Confirm the Collibra attribute type UUIDs are correct and exist in your instance

  • Permission Issues: Ensure your Collibra user has permissions to set the specified attribute types

Debug Mode: Run with debug mode to see detailed logging about custom attribute processing:

Look for log messages like:

  • Processing custom attribute: attribute_name

  • Successfully set custom attribute: attribute_name

  • Skipping custom attribute (not found in check): attribute_name


Deletion Synchronization

The integration automatically synchronizes deletions, removing obsolete check assets from Collibra when checks are deleted or removed in Soda.

How It Works

  1. Pattern Matching: For each dataset, the integration searches for all check assets in Collibra using the naming pattern {checkname}___{datasetName}

  2. Comparison: Compares the list of check assets in Collibra with the current checks returned from Soda

  3. Identification: Identifies assets that exist in Collibra but are no longer present in Soda

  4. Bulk Deletion: Deletes all obsolete assets in a single bulk operation for efficiency

  5. Error Handling: Gracefully handles cases where assets are already deleted (404 errors), treating them as successful deletions

  6. Metrics Tracking: Reports the number of checks deleted in the integration summary

Benefits

  • Automatic Cleanup: Keeps your Collibra catalog in sync with Soda without manual intervention

  • Efficient Processing: Uses bulk deletion operations to minimize API calls

  • Idempotent: Safe to run multiple times - handles already-deleted assets gracefully

  • Transparent: Shows deletion progress in the console output and tracks metrics

Example Output

When obsolete checks are found and deleted, you'll see:

And in the summary:

Configuration

No additional configuration is required. Deletion synchronization is enabled by default and runs automatically for each dataset during the integration process.

Monitoring

Deletion synchronization is tracked in the integration metrics:

  • Checks deleted: Number of obsolete check assets removed from Collibra

  • Error Tracking: Any deletion failures are recorded in the error summary

Error Handling

  • 404 Errors: If assets are already deleted (404 response), the integration treats this as success and continues

  • Other Errors: Network issues, authentication problems, or other HTTP errors are retried with exponential backoff

  • Missing Assets: If no check assets are found in Collibra for a dataset, deletion sync is skipped


Ownership Synchronization

The integration supports automatic synchronization of dataset ownership from Collibra to Soda.

How It Works

  1. Asset Discovery: For each dataset, finds the corresponding table asset in Collibra

  2. Responsibility Extraction: Retrieves ownership responsibilities from Collibra

  3. User Mapping: Maps Collibra users to Soda users by email address

  4. Ownership Update: Updates the Soda dataset with synchronized owners

  5. Error Tracking: Records any failures for monitoring

Configuration Requirements

Ensure the following are configured in your config.yaml:

Monitoring

Ownership synchronization is tracked in the integration metrics:

  • 👥 Owners synchronized: Number of successful ownership transfers

  • ❌ Ownership sync failures: Number of failed synchronization attempts

Error Handling

Common issues and their handling:

  • Missing Collibra Asset: Skip ownership sync for that dataset

  • No Collibra Owners: Log information message, continue processing

  • User Email Mismatch: Track as error, continue with remaining users

  • Soda API Failures: Retry with exponential backoff

Data Quality score guide

Soda-calculated Data Quality scores published into Collibra, plotted as a time-series quality score history.

In order to show the Soda Data Quality score in Collibra, you will need to create an aggregation path as follows:

  1. Navigate to Collibra Settings > Operating Model > Quality Score Aggregation

  2. Create a new score aggregation. You will create two different aggregations as follows:

  1. Assign the new aggregation paths to the asset types COLUMN and TABLE (and any other asset types such as a REPORT).

  • Collibra Settings > Operating Model > Asset Types > Column

  • Click the assignment being used (Default Assignment) > Quality Score Aggregations > External Data Quality > Choose “Soda Data Quality [COLUMN]"

  • Navigate to Collibra Settings > Operating Model > Asset Types > Table

  • Click the assignment being used (Default Assignment) > Quality Score Aggregations > External Data Quality > Choose “Soda Data Quality [TABLE]"

  1. (Optional) If you want to show the Soda Data Quality score in a diagram view on the assets types, you will need to add the above aggregations as an overlay for each asset type (Column, Table, Report) as follows:


For advanced configuration details, head to Operations & advanced usage.


You are not logged in to Soda and are viewing the default public documentation. Learn more about Documentation access & licensing.

Last updated

Was this helpful?