Link Search Menu Expand Document

Deploy a Soda Agent in a Kubernetes cluster

Last modified on 27-Sep-23

The Soda Agent is a tool that empowers Soda Cloud users to securely access data sources to scan for data quality.

These deployment instructions offer generic guidance for setting up a Kubernetes cluster and deploying a Soda Agent in it. Alternatively, you may wish to access a cloud service provider-specific set of instructions for:


Deployment overview
Compatibility
Prerequisites
Create a Soda Cloud account and API keys
Create a Kubernetes cluster
Deploy a Soda Agent
    Deploy using CLI only
    Deploy using a values YAML file
(Optional) Create a practice data source
About the helm install command
Decommission the Soda Agent and cluster
Troubleshoot deployment
Go further

Deployment overview

  1. (Optional) Familiarize yourself with basic Soda, Kubernetes, and Helm concepts.
  2. Install, or confirm the installation of, a few required command-line tools.
  3. Sign up for a Soda Cloud account and create new API keys.
  4. Use the command-line to create a Kubernetes cluster.
  5. Deploy the Soda Agent in the new cluster.
  6. Verify the existence of your new Soda Agent in your Soda Cloud account.

Compatibility

Soda supports Kubernetes cluster version 1.21 or greater.

You can deploy a Soda Agent to connect with the following data sources:

Amazon Athena
Amazon Redshift
Azure Synapse (Experimental)
ClickHouse (Experimental)
Denodo (Experimental)
Dremio
DuckDB (Experimental)
GCP Big Query
Google CloudSQL
IBM DB2
MS SQL Server †
MySQL
OracleDB
PostgreSQL
Snowflake
Trino
Vertica (Experimental)

† MS SQL Server with Windows Authentication does not work with Soda Agent out-of-the-box.

Prerequisites

  • (Optional) You have familarized yourself with basic Soda, Kubernetes, and Helm concepts.
  • You have installed v1.22 or v1.23 of kubectl. This is the command-line tool you use to run commands against Kubernetes clusters. If you have installed Docker Desktop, kubectl is included out-of-the-box. With Docker running, use the command kubectl version --output=yaml to check the version of an existing install.
  • You have installed Helm. This is the package manager for Kubernetes which you will use to deploy the Soda Agent Helm chart. Run helm version to check the version of an existing install.

Create a Soda Cloud account and API keys

The Soda Agent communicates with your Soda Cloud account using API public and private keys. Note that the keys a Soda Agent uses are different from the API keys Soda Library uses to connect to Soda Cloud.

  1. If you have not already done so, create a Soda Cloud account at cloud.soda.io.
  2. In your Soda Cloud account, navigate to your avatar > Scans & Data, then navigate to the Agents tab. Click New Soda Agent.
  3. The dialog box that appears offers abridged instructions to set up a new Soda Agent from the command-line; more thorough instructions exist in this documentation, below.

    For now, copy and paste the values for both the API Key ID and API Key Secret to a temporary, secure place in your local environment. You will need these values in the next section when you deploy the agent in your Kubernetes cluster.
    deploy-agent
  4. You can keep the dialog box open in Soda Cloud, or close it.

Create a Kubernetes cluster

To deploy a Soda Agent in a Kubernetes cluster, you must first create a cluster.

To create a cluster for testing purposes, you can use a tool such as Minikube, microk8s, kind, k3s, or Docker Desktop to create a cluster, or use an existing cluster to which you have pointed a working kubectl.

Because the procedure to create a cluster varies depending upon your cloud services provider, the instructions below offer a simple way of creating cluster using Minikube on which you can deploy a Soda Agent locally. Refer to Kubernetes documentation.

  1. Install minikube to use to create a Kubernetes cluster running locally.
  2. Run the following command to create your local Kubernetes cluster. Be aware that this activity can take awhile. Be patient!
    minikube start --driver=docker
    
    ...
    🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
    
  3. To connect to the newly created cluster and create a namespace, use the following command.
    minikube kubectl -- create namespace soda-agent
    
  4. Run the following command to change the context to associate the current namespace to soda-agent.
    minikube kubectl -- config set-context --current --namespace=soda-agent
    
  5. Run the following command to verify that the cluster kubectl regcognizes soda-agent as the current namespace.
    minikube kubectl -- config get-contexts
    
    CURRENT   NAME               CLUSTER          AUTHINFO          NAMESPACE
    *         minikube           minikube         minikube          soda-agent
    

Deploy a Soda Agent

The following table outlines the two ways you can install the Helm chart to deploy a Soda Agent in your cluster.

Method Description When to use
CLI only Install the Helm chart via CLI by providing values directly in the install command. Use this as a straight-forward way of deploying an agent on a cluster in a secure or local environment.
Use a values YAML file Install the Helm chart via CLI by providing values in a values YAML file. Use this as a way of deploying an agent on a cluster while keeping sensitive values secure.
- provide sensitive API key values in this local file
- store data source login credentials as environment variables in this local file or in an external secrets manager; Soda needs access to the credentials to be able to connect to your data source to run scans of your data. See: Manage sensitive values.

Deploy using CLI only

  1. Add the Soda Agent Helm chart repository.
    helm repo add soda-agent https://helm.soda.io/soda-agent/
    
  2. Use the following comand to install the Helm chart to deploy a Soda Agent in your custer. (Learn more about the helm install command.)
    • Replace the values of soda.apikey.id and soda-apikey.secret with the values you copy+pasted from the New Soda Agent dialog box in your Soda Cloud account.
    • Replace the value of soda.agent.name with a custom name for you agent, if you wish.
    • Specify the value for soda.cloud.endpoint according to your local region: https://cloud.us.soda.io for the United States, or https://cloud.soda.io for all else.
    • Optionally, add soda.scanlauncher settings to configure idle workers in the cluster. Launch an idle worker so that at scan time, the agent passes instructions to an already-running idle Scan Launcher and avoids the time-consuming task of starting the pod from scratch. This helps your Soda Cloud test scans run faster. If you wish, you can configure multiple idle scan launchers waiting for instructions.
      helm install soda-agent soda-agent/soda-agent \
       --set soda.agent.name=myuniqueagent \
       # Use https://cloud.us.soda.io for US region; use https://cloud.soda.io for EU region
       --set soda.cloud.endpoint=https://cloud.soda.io \
       --set soda.apikey.id=*** \
       --set soda.apikey.secret=**** \
       --set soda.scanlauncher.idle.enabled=true \
       --set soda.scanlauncher.idle.replicas=1 \
       --namespace soda-agent
      

      The command-line produces output like the following message:

      NAME: soda-agent
      LAST DEPLOYED: Thu Jun 16 15:03:10 2022
      NAMESPACE: soda-agent
      STATUS: deployed
      REVISION: 1
      
  3. (Optional) Validate the Soda Agent deployment by running the following command:
    minikube kubectl -- describe pods
    
  4. In your Soda Cloud account, navigate to your avatar > Scans & Data > Agents tab. Refresh the page to verify that you see the agent you just created in the list of Agents.

    Be aware that this may take several minutes to appear in your list of Soda Agents. Use the describe pods command in step 3 to check the status of the deployment. When State: Running and Ready: True, then you can refresh and see the agent in Soda Cloud.
    ...
    Containers:
      soda-agent-orchestrator:
         Container ID:   docker://081*33a7
         Image:          sodadata/agent-orchestrator:latest
         Image ID:       docker-pullable://sodadata/agent-orchestrator@sha256:394e7c1**b5f
         Port:           <none>
         Host Port:      <none>
         State:          Running
           Started:      Thu, 16 Jun 2022 15:50:28 -0700
         Ready:          True
         ...
    

    agent-deployed

  5. Next: Add a data source in Soda Cloud using the Soda Agent you just deployed. If you wish, you can create a practice data source so you can try adding a data source in Soda Cloud using the Soda Agent you just deployed.

Deploy using a values YAML file

  1. Using a code editor, create a new YAML file called values.yml.
  2. In that file, copy+paste the content below, replacing the following values:
    • id and secret with the values you copy+pasted from the New Soda Agent dialog box in your Soda Cloud account.
    • Replace the value of name with a custom name for your agent, if you wish.
    • Specify the value for endpoint according to your local region: https://cloud.us.soda.io for the United States, or https://cloud.soda.io for all else.
    • Optionally, add soda.scanlauncher settings to configure idle workers in the cluster. Launch an idle worker so that at scan time, the agent passes instructions to an already-running idle Scan Launcher and avoids the time-consuming task of starting the pod from scratch. This helps your Soda Cloud test scans run faster. If you wish, you can configure multiple idle scan launchers waiting for instructions.
      soda:
         apikey:
           id: "***"
           secret: "***"
         agent:
           name: "myuniqueagent"
         scanlauncher:
           idle:
             enabled: true
             replicas: 1
         cloud:
           # Use https://cloud.us.soda.io for US region; use https://cloud.soda.io for EU region
           endpoint: "https://cloud.soda.io"
      
  3. Save the file. Then, in the same directory in which the values.yml file exists, use the following command to install the Soda Agent helm chart.
    helm install soda-agent soda-agent/soda-agent \
      --values values.yml \
      --namespace soda-agent
    
  4. (Optional) Validate the Soda Agent deployment by running the following command:
    minikube kubectl -- describe pods
    
  5. In your Soda Cloud account, navigate to your avatar > Scans & Data > Agents tab. Refresh the page to verify that you see the agent you just created in the list of Agents.

    Be aware that this may take several minutes to appear in your list of Soda Agents. Use the describe pods command in step three to check the status of the deployment. When State: Running and Ready: True, then you can refresh and see the agent in Soda Cloud.
    ...
    Containers:
      soda-agent-orchestrator:
     Container ID:   docker://081*33a7
     Image:          sodadata/agent-orchestrator:latest
     Image ID:       docker-pullable://sodadata/agent-orchestrator@sha256:394e7c1**b5f
     Port:           <none>
     Host Port:      <none>
     State:          Running
       Started:      Thu, 16 Jun 2022 15:50:28 -0700
     Ready:          True
    ...
    

    agent-deployed

  6. Next: Add a data source in Soda Cloud using the Soda Agent you just deployed. If you wish, you can create a practice data source so you can try adding a data source in Soda Cloud using the Soda Agent you just deployed.

If you use private key authentication with a Soda Agent, refer to Manage sensitive values for a Soda Agent.

(Optional) Create a practice data source

If you wish to try creating a new data source in Soda Cloud using the agent you deployed, you can use the following command to create a PostgreSQL warehouse containing example data from the NYC Bus Breakdowns and Delay Dataset.

From the command-line, copy+paste and run the following to create the data source as a pod on your new cluster.

cat <<EOF | kubectl apply -n soda-agent -f -
---
apiVersion: v1
kind: Pod
metadata:
  name: nybusbreakdowns
  labels:
    app: nybusbreakdowns
spec:
  containers:
  - image: sodadata/nybusbreakdowns
    imagePullPolicy: IfNotPresent
    name: nybusbreakdowns
    ports:
    - name: tcp-postgresql
      containerPort: 5432
  restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: nybusbreakdowns
  name: nybusbreakdowns
spec:
  ports:
  - name: tcp-postgresql
    port: 5432
    protocol: TCP
    targetPort: tcp-postgresql
  selector:
    app: nybusbreakdowns
  type: ClusterIP
EOF

Output:

pod/nybusbreakdowns created
service/nybusbreakdowns created


Once the pod of practice data is running, you can use the following configuration details when you add a data source in Soda Cloud, in step 2, Connect the Data Source.

data_source your_datasource_name:
  type: postgres
  connection:
    host: nybusbreakdowns
    port: 5432
    username: sodacore
    password: sodacore
    database: sodacore
    schema: new_york

About the helm install command

helm install soda-agent soda-agent/soda-agent \
  --set soda.agent.target=azure-aks-virtualnodes \
  --set soda.agent.name=myuniqueagent \
  --set soda.apikey.id=*** \
  --set soda.apikey.secret=**** \
  --namespace soda-agent
Command part Description
helm install the action helm is to take
soda-agent (the first one) a release named soda-agent on your cluster
soda-agent (the second one) the name of the helm repo you installed
soda-agent (the third one) the name of the helm chart that is the Soda Agent

The --set options either override or set some of the values defined in and used by the Helm chart. You can override these values with the --set files as this command does, or you can specify the override values using a values.yml file.

Parameter key Parameter value, description
--set soda.agent.target (Optional) The cluster the command target. Use when deploying to aws-eks or azure-aks-virtualnodes.
--set soda.agent.name A unique name for your Soda Agent. Choose any name you wish, as long as it is unique in your Soda Cloud account.
--set soda.apikey.id With the apikey.secret, this connects the Soda Agent to your Soda Cloud account. Use the value you copied from the dialog box in Soda Cloud when adding a new agent. You can use a values.yml file to pass this value to the cluster instead of exposing it here.
--set soda.apikey.secret With the apikey.id, this connects the Soda Agent to your Soda Cloud account. Use the value you copied from the dialog box in Soda Cloud when adding a new agent. You can use a values.yml file to pass this value to the cluster instead of exposing it here.
--set soda.scanlauncher.idle.enabled=true (Optional) Launch an idle worker so at scan time, the agent can hand over instructions to an already running idle scan launcher to avoid the start-from-scratch setup time for a pod. You can have multiple idle scan launchers waiting for instructions.
--set soda.scanlauncher.idle.replicas=1 (Optional) Replicate an idle worker to have more workers ready to handle instructions without setting up a new pod.
--namespace soda-agent Use the namespace value to identify the namespace in which to deploy the agent.


Decomission the Soda Agent and cluster

  1. Uninstall the Soda Agent in the cluster.
    helm delete soda-agent -n soda-agent
    
  2. Delete the cluster.
    minikube delete
    
    💀  Removed all traces of the "minikube" cluster.
    

Troubleshoot deployment

Refer to Helpful kubectl commands for instructions on accessing logs and investigating issues.

Problem: Scans launched from Soda Cloud take an excessive amount of time to run.

Solution: Consider adjusting the number of replicas for idle workers with kubectl. Launch extra idle workers so at scan time, the agent can hand over instructions to an already running idle scan launcher to avoid the start-from-scratch setup time for a pod.

  1. Ensure that the agent was deployed with the soda.scanlauncher.idle configurations for enabled: true and replicas: 1 or more.
  2. Run the following command to increase the number of active replicas to 2.
    kubectl scale deployment/soda-agent-scanlauncher \
      --replicas 2 -n soda-agent
    

    Be aware that this adjustment only increases the number of idle scan launchers until you change another configuration value via helm.


Problem: After setting up a cluster and deploying the agent, you are unable to see the agent running in Soda Cloud.

Solution: The value you specify for the soda-cloud-enpoint must correspond with the region you selected when you signed up for a Soda Cloud account:

  • Usehttps://cloud.us.soda.io for the United States
  • Use https://cloud.soda.io for all else



Go further


Was this documentation helpful?

What could we do to improve this page?

Documentation always applies to the latest version of Soda products
Last modified on 27-Sep-23