Profiling
Profiling provides a quick and comprehensive overview of a dataset’s structure and key statistics.
Profiling helps you understand the shape, quality, and uniqueness of your data before creating checks or metric monitors.
With profiling, you can explore metadata about your dataset, such as column names, data types, distinct counts, null counts, and summary statistics. You can also quickly search for specific columns to focus on the attributes that matter most to your analysis.
Profiling is useful for:
Business teams: Gain a fast understanding of what’s inside a dataset, its completeness, and potential anomalies.
Data teams: Validate schema, data types, and distributions before writing quality tests or transformations.
Data owners: Quickly identify unexpected values, nulls, or structural changes in a dataset.
Key features
Dataset overview: Displays a structured view of all columns, their types, and counts.
Interactive navigation: Scroll through the dataset structure or jump directly to a column of interest.
Search and filter: Quickly locate a column by name to review its profiling details.
Column-level insights:
Statistics
Column name
Column data type
Number of distinct values
Number of missing (null) values
Minimum, maximum, mean (for numeric columns)
Length, patterns, or categories (for text columns)
Histogram for numeric columns
Frequent values
Extreme values, for numeric columns
Data checks that exist for this column

Enable & configure Profiling
1. Enable Profiling
You can enable Profiling during dataset onboarding.
If you want to enable Profiling on an existing dataset, follow the next steps:
Click on Datasets > The dataset of your choosing
Navigate to the Columns tab in the dataset view
Click on Update Profiling Configuration
Toggle on Enable Profiling
2. Configure Profiling
Once Profiling has been enabled, you can configure it to adapt to your organization's needs.
1. Choose a Profiling schedule
Profiling happens every 24 hours. Choose a UTC time from the dropdown menu to pick a specific hour when the scan will be scheduled.

Choose a Profiling strategy
Use sampling: To perform Profiling, Soda will use a sample of up to 1 million rows from the dataset.
Use a time window: To perform Profiling, Soda will use data present in a 30-day time window, based on the dataset time-partition column.

Click on Finish
Now, Profiling will be scheduled.
Disable Profiling
Disable column profiling at the organization level
If you wish to disable column profiling at the organization level, you must possess Admin privileges in your Soda Cloud account. Once confirmed, follow these steps:
Navigate to your avatar.
Click on Organization settings.
Uncheck the box labeled Allow Soda to collect column profile information.
How it works
When you open Profiling for a dataset:
Soda runs a lightweight scan of the dataset’s metadata and a sample of the data (depending on configuration).
It calculates summary statistics for each column.
Results are displayed in the Profiling view for exploration.
Key considerations
Soda can only profile columns that contain
NUMBERS
orTEXT
type data; it cannot profile columns that containTIMESTAMP
data except to create a freshness check for the anomaly dashboard.Soda performs the Discover datasets and Profile datasets actions independently, relative to each other. If you define
exclude
orinclude
rules in the Discover tab, the Profile configuration does not inherit the Discover rules. For example, if, for Discover, you exclude all datasets that begin withstaging_
, then configure Profile to include all datasets, Soda discovers and profiles all datasets.
Next Steps
After reviewing profiling results, you can:
Create tests based on profiling insights (e.g., "column should not have nulls").
Set up monitors to track data quality over time.
Export profiling information to support documentation and governance processes.
Last updated
Was this helpful?