Link Search Menu Expand Document

Soda product overview


soda-sql-logo

Soda SQL is a free, open-source command-line tool. It utilizes user-defined input to prepare SQL queries that run tests on datasets in a data source to find invalid, missing, or unexpected data. When tests fail, they surface the data that you defined as “bad” in the tests. Armed with this information, you and your data engineering team can diagnose where the “bad” data entered your data pipeline and take steps to prioritize and resolve issues based on downstream impact.

Use Soda SQL on its own to manually or programmatically scan the data that your organization uses to make decisions. Optionally, you can integrate Soda SQL with your data orchestration tool to schedule scans and automate actions based on scan results. Further, you can connect Soda SQL to a free Soda Cloud account where you and your team can use the web application to monitor test results and collaborate to keep your data issue-free.

Get started with the Quick start tutorial for Soda SQL.


soda-cloud-logo

Soda Cloud is the web application that connects to your data source, aggregates all metrics and tests, and enables your teammates to add even more. Log in to the web app to examine the visualized results of scans, view historical scan data, and set up alerts that automatically notify your team when there is an issue with your data.

Soda Cloud uses Soda SQL in the background to run scheduled scans. Soda SQL uses a secure API to connect to Soda Cloud. When it completes a scan, it pushes the scan results to your Soda Cloud account where you can log in and examine the details in the web application.

Beyond increasing the observability of your data, Soda Cloud enables you to automatically detect anomalies, and view samples of the rows that failed a test during a scan. Integrate Soda Cloud with your Slack workspace to collaborate with your team on data monitoring.

Get started with the Quick start tutorial for Soda Cloud.

Compare features and functionality

Connect Soda SQL to your Soda Cloud account to take advantage of all the features and functionality.

soda-sql-logo soda-cloud-logo
Connect to a data source using a warehouse YAML file and an env_vars YAML file Connect to a data source via Add Datasets
Edit connection details in a warehouse YAML file Edit data source connection and import details
Discover datasets of a newly-connected data source using the soda analyze
CLI command
Discover datasets of a newly-connected data source during the first scheduled scan
Define new tests in the scan YAML file only for datasets you added via Soda SQL. Define new tests when you create new monitors for datasets you added via Soda Cloud or Soda SQL
Edit existing tests in the scan YAML file Edit the tests in existing monitors
Use dataset metrics in the scan YAML file Use dataset metrics when creating a new monitor
Use column metrics in the scan YAML file Use column metrics when creating a new monitor
Use custom metrics in the scan YAML file Use custom metrics when creating a new monitor
Copy + paste custom metrics from templates into your scan YAML file  
View scan results from tests that use template custom metrics in the command-line View scan results from tests that use template custom metrics in the Monitors dashboard
Configure programmatic scans  
Integrate with an orchestration tool such as Airflow  
Add filters in the scan YAML file Add filters when creating a monitor
Exclude columns from scans  
Run an ad hoc scan  
Schedule scans using your data orchestration tool Schedule scans for a data source and individual datasets
  View a chart to get visibility into stored measurements for a metric over time
  Create alerts and notifications
Configure scan YAML to send
failed row samples to Soda Cloud
Use a missing value metric type to collect failed row samples
  View failed rows
Configure scan YAML to send
sample dataset data to Soda Cloud
  View sample data for a dataset
  Use anomaly detection
  Collaborate with your team to monitor your data: invite team members, and integrate with Slack

Go further


Last modified on 15-Sep-21

Was this documentation helpful?
Give us your feedback in the #soda-docs channel in the Soda community on Slack or open an issue in GitHub.