Link Search Menu Expand Document

Quick start tutorial for Soda Cloud

Sign up for a free Soda Cloud account, add datasets to your account, integrate with Slack, then create a monitor and an alert to begin monitoring your data.

tutorial-cloud-happy-path

Sign up and add datasets

All the instructions in this tutorial reference a PostgreSQL data source, but you can use your own data source for this tutorial. Connect to any of the following Soda-compatible data sources to which you have access.

Amazon Athena
Amazon Redshift
Apache Hive
GCP BigQuery
Microsoft SQL Server
PostgreSQL
Snowflake

  1. If you have not already done so, create a free Soda Cloud account at cloud.soda.io.
  2. In Soda Cloud, navigate to Datasets, then click Add Datasets and follow the guided steps to connect a data source. As reference, use the following input values for the Connection Details for a PostgreSQL data source.
    • Data source name: demodata
    • Data source type: PostgreSQL
    • Host: localhost
    • Username: sodasql 1
    • Password: Eg abc123 1
    • Database: sodasql
    • Schema: public
  3. Click Next to access Import Settings. In this panel, you have the option to instruct Soda Cloud to automatically discover new datasets as they are added to your data source, and/or define filters to limit the data Soda Cloud scans. For this tutorial, click Next to skip ahead.
  4. Define the default Scan Schedule for all datasets in your data source. Set the values to scan the data source every hour, starting at the top of the next hour in your time zone, then Save.
    For this tutorial, you are setting the schedule to run as soon as possible after you have completed this step. When Soda Cloud runs the scheduled scan, it automatically discovers all the datasets in the data source (such as all the tables in a PostgreSQL warehouse) and makes the details available to you so that you can create a new monitor.

Refer to Add datasets in Soda Cloud for further details.

1 Stored in a secure database; password is encrypted.

Integrate with Slack

While you wait for Soda Cloud to complete its first scheduled scan of your data source, connect your Soda Cloud account to your Slack workspace. Making this connection enables you to send Slack notifications to your team when a data issue triggers an alert.

If you do not use Slack, Soda Cloud notifies you and any teammates you invite via email.

  1. In Soda Cloud, navigate to your avatar > Organization Settings > Integrations, then follow the guided steps to authorize Soda Cloud to connect to your Slack workspace.
  2. Select the all Slack channels to which you might send notifications when Soda finds an issue with your data, then Save.

Create a monitor and alert

After Soda Cloud completes its first scheduled scan of your data source, you can use the data and metadata it collected, such as column names and data types, to create a monitor and alert.

Note that Soda Cloud also automatically created a row count anomaly detection monitor for each dataset that contains time-series data. This enables Soda Cloud to start learning row count patterns in your dataset over the course of the next few scheduled scans and surface anything it recognizes as anomalous. See anomaly detection for details.

For a new monitor, you define several details including which data to test, what tests to run, and whom to notify when bad data triggers an alert.

  1. In Soda Cloud, navigate to the Monitors dashboard, then click the stacked dots to Create Monitor. Select the type Metric, then follow the guided steps to complete the setup. Use the following input values for reference.
    • Dataset: demodata
    • Metric Type: Row Count
      (For datasets you added via Soda Cloud, you can only select Row Count for this field. Soon, Soda Cloud will make more Metric Types available for selection for all datasets.)
    • Column: n/a
    • Evaluation type: Threshold
    • Critical Alert: if less than; 1
    • Add people, roles or channels to alert: your slack channel, if using Slack
    • Notify about: Critical Alerts
    • Frequency: Immediately
  2. When Soda Cloud runs its next scheduled scan of your data source, it runs the test you just created in your monitor. If the test fails (which, with the example input, would indicate that your dataset is empty), the failure triggers the alert you defined and sends a notification to the Slack channel you identified in your monitor, or your email address if you do not use Slack.

Refer to Create monitors and alerts for further details.

Review your scan results

When Soda Cloud completes its second scheduled scan of your data source, it runs your test and presents the results in the Monitors dashboard.

  1. Review the results of your test in the Monitor Results table in Soda Cloud to find the result for the monitor you just created. See the example below in which a test passed. tutorial-monitor-results
  2. Click the monitor result to access details that can help you diagnose and solve the data issue.
  3. Check your Slack channel or email inbox; if the test failed, the scan surfaced a data issue that triggered your alert so Soda Cloud sent a notification.

Go further



Last modified on 16-Jul-21

Was this documentation helpful?
Give us your feedback in the #soda-docs channel in the Soda community on Slack or open an issue in GitHub.