Last modified on 30-Nov-23
Use this guide to set up the Soda Cloud to enable users across your organization to serve themselves when it comes to testing data quality.
Deploy a Soda Agent in a Kubernetes cluster to connect to both a data source and the Soda Cloud, then invite your Data Analyst and Scientist colleagues to join the account to create agreements and begin writing their own SodaCL checks for data quality.
(Not quite ready for this big gulp of Soda? 🥤Try taking a sip, first.)
The instructions below offer Data Engineers an example of how to set up the Soda Cloud to enable colleagues to prepare their own data quality tests. After all, data quality testing is a team sport!
For context, the example assumes that you have the appropriate access to a cloud services provider environment such as Azure, AWS, or Google Cloud that allows you to create and deploy applications to a cluster. Further, it assumes that you, or someone on your team, has access to the login credentials that Soda needs to be able to access a data source such as MS SQL, Big Query, or Athena so that Soda can run scans of the data.
Once you have completed the set-up, you can direct your colleagues to log in to Soda Cloud and begin creating Agreements. An agreement is a contract between stakeholders that stipulates the expected and agreed-upon state of data quality in a data source. It contains data quality checks that run according to the schedule you defined for the data source.
When checks fail during data quality scans, you and your colleagues get alerts via Slack which enable you to address issues before they have a downstream impact on the users or systems that depend upon the data.
The Soda Agent is a tool that empowers Soda Cloud users to securely access data sources to scan for data quality. Create a Kubernetes cluster in a cloud services provider environment, then use Helm to deploy a Soda Agent in the cluster.
Access the exhaustive deployment instructions for the cloud services provider you use.
- Cloud services provider-agnostic instructions
- Amazon Elastic Kubernetes Service (EKS)
- Microsoft Azure Kubernetes Service (AKS)
- Google Kubernetes Engine (GKE)
The Soda Agent supports connections with the following data sources.
|Amazon Athena |
GCP Big Query
|IBM DB2 |
MS SQL Server1
1 MS SQL Server with Windows Authentication does not work with Soda Agent out-of-the-box.
- Log in to your Soda Cloud account, then navigate to your avatar > Data Sources.
- In the Agents tab, confirm that you can see the Soda Agent you deployed and that its status is “green” in the Last Seen column. If not, refer to the Soda Agent documentation to troubleshoot its status.
- Navigate to the Data source tab, then click New Data Source and follow the guided steps to:
- identify the new data source and its default scan schedule
- provide connection configuration details for the data source, and test the connection to the data source
- profile the datasets in the data source to gather basic metadata about the contents of each
- identify the datasets to which you wish to apply automated monitoring for anomalies and schema changes
- assign ownership roles for the data source and its datasets
- Save the new data source.
Use this integration to enable Soda to send alert notifications to a Slack channel to notify your team of warn and fail check results. If your team does not use Slack, you can skip this step and Soda sends alert notifications via email.
- Log in to your Soda Cloud account and navigate to your avatar > Organization Settings, then navigate to the Integrations tab and click the + icon to add a new integration.
- Follow the guided steps to authorize Soda to connect to your Slack workspace. If necessary, contact your organization’s Slack Administrator to approve the integration with Soda.
- Configuration tab: select the public channels to which Soda can post messages; Soda cannot post to private channels.
- Scope tab: select the Soda features, both alert notifications and incidents, which can access the Slack integration.
- To dictate where Soda must send alert notifications for checks that fail, create a new notification rule. Navigate to your avatar > Notification Rules, then click New Notification Rule. Follow the guided steps to complete the new rule directly Soda to send check results that fail to a specific channel in your Slack workspace.
After testing and saving the new data source, invite your colleagues to your Soda Cloud account so they can begin creating new agreements.
Navigate to your avatar > Invite Team Members, then complete the form to send invitations to your colleagues. Provide them with the following links to help them get started:
✨Well done!✨ You’ve taken the first step towards a future in which you and your colleagues can collaborate on defining and maintaining good-quality data. Huzzah!
- Get organized in Soda!
- Integrate Soda with your data catalog.
- Use failed row samples to investigate data quality issues.
- Request a demo. Hey, what can Soda do for you?
- Join the Soda community on Slack.
Was this documentation helpful?
What could we do to improve this page?
Documentation always applies to the latest version of Soda products
Last modified on 30-Nov-23