Soda SQL is a free, open-source command-line tool. It utilizes user-defined input to prepare SQL queries that run tests on datasets in a data source to find invalid, missing, or unexpected data. When tests fail, they surface the data that you defined as “bad” in the tests. Armed with this information, you and your data engineering team can diagnose where the “bad” data entered your data pipeline and take steps to prioritize and resolve issues based on downstream impact.
Use Soda SQL on its own to manually or programmatically scan the data that your organization uses to make decisions. Optionally, you can integrate Soda SQL with your data orchestration tool to schedule scans and automate actions based on scan results. Further, you can connect Soda SQL to a free Soda Cloud account where you and your team can use the web application to monitor test results and collaborate to keep your data issue-free.
Get started with the Quick start tutorial for Soda SQL.
Soda Cloud is the web application that connects to your data source and runs scheduled scans of your data. Log in to the web app to examine the visualized results of scans, view historical scan data, and set up alerts that automatically notify your team when there is an issue with your data.
Soda Cloud uses Soda SQL in the background to run scheduled scans. Soda SQL uses a secure API to connect to Soda Cloud. When it completes a scan, it pushes the scan results to your Soda Cloud account where you can log in and examine the details in the web application.
Beyond increasing the observability of your data, Soda Cloud enables you to automatically detect anomalies, and view samples of the rows that failed a test during a scan. Integrate Soda Cloud with your Slack workspace to collaborate with your team on data monitoring.
Get started with the Quick start tutorial for Soda Cloud.
Connect Soda SQL to your Soda Cloud account to take advantage of all the features and functionality.
|Connect to a data source using a warehouse YAML file and an env_vars YAML file||Connect to a data source via Add Datasets|
|Edit connection details in a warehouse YAML file||Edit data source connection and import details|
|Discover datasets of a newly-connected data source using the |
|Discover datasets of a newly-connected data source during the first scheduled scan|
|Define new tests in the scan YAML file only for datasets you added via Soda SQL.||Define new tests when you create new monitors for datasets you added via Soda Cloud or Soda SQL|
|Edit existing tests in the scan YAML file||Edit the tests in existing monitors|
|Use dataset metrics in the scan YAML file||Use dataset metrics when creating a new monitor|
|Use column metrics in the scan YAML file||Use column metrics when creating a new monitor|
|Use custom metrics in the scan YAML file|
|Configure programmatic scans|
|Integrate with an orchestration tool such as Airflow|
|Add filters in the scan YAML file||Add filters when creating a monitor|
|Run an ad hoc scan|
|Schedule scans using your data orchestration tool||Schedule scans for a data source and datasets|
|Get visibility into stored measurements for a metric over time|
|Create alerts and notifications|
|Configure scan YAML to send |
failed row samples to Soda Cloud
|Use a missing value metric type to collect failed row samples|
|View failed rows|
|Configure scan YAML to send |
sample dataset data to Soda Cloud
|View sample data for a dataset|
|Use anomaly detection|
|Collaborate with your team to monitor your data: invite team members, and integrate with Slack|
- Install Soda SQL and sign up for a free Soda Cloud account at cloud.soda.io.
- Contribute to Soda SQL development on GitHub: github.com/sodadata/soda-sql
- Automatically detect anomalies in your data using Soda Cloud.
- Need help? Join the Soda community on Slack.
Last modified on 16-Jul-21