Setup & configuration
This page provides detailed information about how to configure the Soda↔Collibra integration.
Both Collibra and Soda need to be configured so the integration can run successfully. This page covers both Collibra and Soda settings, including asset types, attribute types, relation types, and domain mappings. These settings establish the foundation for reliable synchronization of data quality checks and metadata between Soda and Collibra.
Configuration Guide
1. Collibra Configuration
Base Settings
collibra:
base_url: "https://your-instance.collibra.com/rest/2.0"
username: "your-username"
password: "your-password"
general:
naming_delimiter: ">" # Used to separate parts of asset namesAsset Types
Configure the different types of assets in Collibra:
asset_types:
table_asset_type: "00000000-0000-0000-0000-000000031007" # ID for Table assets
soda_check_asset_type: "00000000-0000-0000-0000-000000031107" # ID for Data Quality Metric type
dimension_asset_type: "00000000-0000-0000-0000-000000031108" # ID for Data Quality Dimension type
column_asset_type: "00000000-0000-0000-0000-000000031109" # ID for Column typeAttribute Types
Define the attributes that will be set on check assets:
Diagnostic Attributes Behavior:
Flexible Extraction: Automatically extracts metrics from any diagnostic type (
missing,aggregate,valid, etc.)Future-Proof: Works with new diagnostic types that Soda may introduce
Smart Fallbacks: Falls back to
datasetRowsTestedifcheckRowsTestedis not availableCalculated Values: Automatically computes
check_rows_passedandcheck_passing_fractionwhen source data is availableGraceful Handling: Leaves attributes empty when diagnostic data is not present in the check result
Relation Types
Define the types of relationships between assets:

Responsibilities
Configure ownership role mappings:
Domains
Configure the domains where assets will be created:
2. Soda Configuration
Base Settings
General Settings
Attributes
Define Soda attributes and their mappings:
Multiple dimensions support
The integration supports both single and multiple dimensions for data quality checks:
Single dimension: Specify as a string value (e.g.,
"Completeness")Multiple dimensions: Use a comma-separated string (e.g.,
"Completeness, Consistency")
When multiple dimensions are provided as a comma-separated string, the integration will:
Automatically split the string by commas and trim whitespace
Search for each dimension asset in Collibra individually
Create a relation for each dimension found
Log a warning for any dimension that cannot be found in Collibra
Continue processing even if some dimensions are missing
Example Configuration:
This will create three separate dimension relations in Collibra, one for each dimension specified.
Monitor Exclusion
The integration can exclude Soda monitors (items with metricType) from synchronization:
Enabled (
sync_monitors: true): All checks and monitors are synchronized (default)Disabled (
sync_monitors: false): Only checks are synchronized, monitors are filtered out
When sync_monitors is disabled, the integration will:
Filter out all items that have a
metricTypeattributeOnly process actual checks (items without
metricType)Log the number of monitors filtered out for each dataset
Continue processing with the remaining checks
This is useful when you want to focus on data quality checks and exclude monitoring metrics from your Collibra catalog.
Custom Attribute Syncing configuration
See the Custom Attribute Syncing section below for detailed instructions.
Custom Attribute Syncing
The integration supports syncing custom attributes from Soda checks to Collibra assets, allowing you to enrich your Collibra assets with business context and additional metadata from your data quality checks.
How Custom Attribute Syncing Works
Custom attribute syncing enables you to map specific attributes from your Soda checks to corresponding attribute types in Collibra. When a check is synchronized, the integration will automatically extract the values of these attributes and set them on the created/updated Collibra asset.
Configuration
To enable custom attribute syncing, add the custom_attributes_mapping_soda_attribute_name_to_collibra_attribute_type_id configuration to your config.yaml file:
The configuration value is a JSON string containing key-value pairs where:
Key: The name of the attribute in Soda (as it appears on your Soda checks)
Value: The UUID of the corresponding attribute type in Collibra
Step-by-Step Setup
1. Identify Soda Attributes
First, identify which attributes from your Soda checks you want to sync to Collibra. Common examples include:
description- Check descriptionbusiness_impact- Business impact assessmentdata_domain- Data domain classificationcriticality- Data criticality levelowner_team- Owning team information
2. Find Collibra Attribute Type UUIDs
For each Soda attribute, find the corresponding attribute type UUID in Collibra:
Navigate to your Collibra instance
Go to Settings → Metamodel → Attribute Types
Find or create the attribute types you want to map to
Copy the UUID of each attribute type
3. Create the JSON Mapping
Create a JSON object mapping Soda attribute names to Collibra attribute type UUIDs:
4. Add to Configuration
Add the JSON mapping to your config.yaml file as a single-line string:
Complete Example
Here's a complete example showing how to configure custom attribute syncing:
Soda Check with Custom Attributes:
Collibra Configuration:
Result: When this check is synchronized, the integration will create a Collibra asset with these attributes automatically set:
Description: "Ensures orders table is not empty"
Business Impact: "critical"
Data Domain: "sales"
Criticality: "high"
⚠️ Important Notes
JSON Format: The mapping must be a valid JSON string enclosed in single quotes
Attribute Type UUIDs: Use the exact UUIDs from your Collibra metamodel
Case Sensitivity: Soda attribute names are case-sensitive and must match exactly
Missing Attributes: If a Soda check doesn't have an attribute defined in the mapping, it will be skipped (no error)
Invalid UUIDs: Invalid Collibra attribute type UUIDs will cause the sync to fail for that attribute
Troubleshooting
Common Issues:
Invalid JSON: Ensure the JSON string is properly formatted and enclosed in single quotes
Attribute Not Found: Verify the Soda attribute names match exactly what's defined in your checks
UUID Errors: Confirm the Collibra attribute type UUIDs are correct and exist in your instance
Permission Issues: Ensure your Collibra user has permissions to set the specified attribute types
Debug Mode: Run with debug mode to see detailed logging about custom attribute processing:
Look for log messages like:
Processing custom attribute: attribute_nameSuccessfully set custom attribute: attribute_nameSkipping custom attribute (not found in check): attribute_name
Deletion Synchronization
The integration automatically synchronizes deletions, removing obsolete check assets from Collibra when checks are deleted or removed in Soda.
How It Works
Pattern Matching: For each dataset, the integration searches for all check assets in Collibra using the naming pattern
{checkname}___{datasetName}Comparison: Compares the list of check assets in Collibra with the current checks returned from Soda
Identification: Identifies assets that exist in Collibra but are no longer present in Soda
Bulk Deletion: Deletes all obsolete assets in a single bulk operation for efficiency
Error Handling: Gracefully handles cases where assets are already deleted (404 errors), treating them as successful deletions
Metrics Tracking: Reports the number of checks deleted in the integration summary
Benefits
Automatic Cleanup: Keeps your Collibra catalog in sync with Soda without manual intervention
Efficient Processing: Uses bulk deletion operations to minimize API calls
Idempotent: Safe to run multiple times - handles already-deleted assets gracefully
Transparent: Shows deletion progress in the console output and tracks metrics
Example Output
When obsolete checks are found and deleted, you'll see:
And in the summary:
Configuration
No additional configuration is required. Deletion synchronization is enabled by default and runs automatically for each dataset during the integration process.
Monitoring
Deletion synchronization is tracked in the integration metrics:
Checks deleted: Number of obsolete check assets removed from Collibra
Error Tracking: Any deletion failures are recorded in the error summary
Error Handling
404 Errors: If assets are already deleted (404 response), the integration treats this as success and continues
Other Errors: Network issues, authentication problems, or other HTTP errors are retried with exponential backoff
Missing Assets: If no check assets are found in Collibra for a dataset, deletion sync is skipped
Ownership Synchronization
The integration supports automatic synchronization of dataset ownership from Collibra to Soda.
How It Works
Asset Discovery: For each dataset, finds the corresponding table asset in Collibra
Responsibility Extraction: Retrieves ownership responsibilities from Collibra
User Mapping: Maps Collibra users to Soda users by email address
Ownership Update: Updates the Soda dataset with synchronized owners
Error Tracking: Records any failures for monitoring
Configuration Requirements
Ensure the following are configured in your config.yaml:
Monitoring
Ownership synchronization is tracked in the integration metrics:
👥 Owners synchronized: Number of successful ownership transfers
❌ Ownership sync failures: Number of failed synchronization attempts
Error Handling
Common issues and their handling:
Missing Collibra Asset: Skip ownership sync for that dataset
No Collibra Owners: Log information message, continue processing
User Email Mismatch: Track as error, continue with remaining users
Soda API Failures: Retry with exponential backoff
Data Quality score guide

In order to show the Soda Data Quality score in Collibra, you will need to create an aggregation path as follows:
Navigate to Collibra Settings > Operating Model > Quality Score Aggregation
Create a new score aggregation. You will create two different aggregations as follows:


If you are using Collibra as a report catalog and want to show Quality Scores on your reports, you will create a third aggregation using the path “Report is part of data structure” & “Asset complies with Governance Asset”.
Assign the new aggregation paths to the asset types
COLUMNandTABLE(and any other asset types such as aREPORT).
Collibra Settings > Operating Model > Asset Types > Column
Click the assignment being used (Default Assignment) > Quality Score Aggregations > External Data Quality > Choose “Soda Data Quality [COLUMN]"
Navigate to Collibra Settings > Operating Model > Asset Types > Table
Click the assignment being used (Default Assignment) > Quality Score Aggregations > External Data Quality > Choose “Soda Data Quality [TABLE]"

(Optional) If you want to show the Soda Data Quality score in a diagram view on the assets types, you will need to add the above aggregations as an overlay for each asset type (Column, Table, Report) as follows:

For advanced configuration details, head to Operations & advanced usage.
Last updated
Was this helpful?
