Link Search Menu Expand Document

Deploy a Soda Agent in Azure AKS preview

Last modified on 26-Jan-23

The Soda Agent is a tool that empowers Soda Cloud users to securely access data sources to scan for data quality. Create an Azure Kubernetes Service (AKS) cluster, then use Helm to deploy a Soda Agent in the cluster.

This setup enables Soda Cloud users to securely connect to data sources (Snowflake, MS SQL Server, etc.) from within the Soda Cloud web application. Any user in your Soda Cloud account can add a new data source via the agent, then write their own agreements to check for data quality in the new data source.

Deployment overview
Compatibility
Prerequisites
Create a Soda Cloud account and API keys
Create an AKS cluster
    Create a regular cluster
    Create a virtual cluster
Deploy a Soda Agent
    Deploy using CLI only - regular cluster
    Deploy using CLI only - virtual cluster
    Deploy using a values YAML file
(Optional) Create a practice data source
About the helm install command
Decommission the Soda Agent and the AKS cluster
Troubleshoot deployment
Go further

Deployment overview

  1. (Optional) Familiarize yourself with basic Soda, Kubernetes, and Helm concepts.
  2. Install, or confirm the installation of, a few required command-line tools.
  3. Sign up for a Soda Cloud account and create new API keys.
  4. Use the command-line to create a Kubernetes cluster.
  5. Deploy the Soda Agent in the new cluster.
  6. Verify the existence of your new Soda Agent in your Soda Cloud account.

Compatibility

Soda supports Kubernetes cluster version 1.21 or greater.

You can deploy a Soda Agent to connect with the following data sources:

Amazon Athena
Amazon Redshift
Azure Synapse (Experimental)
ClickHouse (Experimental)
Denodo (Experimental)
Dremio
DuckDB (Experimental)
GCP Big Query
IBM DB2
MS SQL Server †
MySQL
OracleDB
PostgreSQL
Snowflake
Trino
Vertica (Experimental) - Coming soon

† MS SQL Server with Windows Authentication does not work with Soda Agent out-of-the-box.

Prerequisites

  • (Optional) You have familiarized yourself with basic Soda, Kubernetes, and Helm concepts.
  • You have an Azure account and the necessary permissions to enable you to create an AKS cluster in your region. Consult the Azure access control documentation for details.
  • You have installed the Azure CLI tool. This is the command-line tool you need to access your Azure account from the command-line. Run az --version to check the version of an existing install. Consult the Azure Command-Line Interface documentation for details.
  • You have logged in to your Azure account. Run az login to open a browser and log in to your account.
  • You have installed v1.22 or v1.23 of kubectl. This is the command-line tool you use to run commands against Kubernetes clusters. If you have already installed the Azure CLI tool, you can install kubectl using the following command: az aks install-cli.
    Run kubectl version --output=yaml to check the version of an existing install.
  • You have installed Helm. This is the package manager for Kubernetes which you will use to deploy the Soda Agent Helm chart. Run helm version to check the version of an existing install.

Create a Soda Cloud account and API keys

The Soda Agent communicates with your Soda Cloud account using API public and private keys. Note that the keys a Soda Agent uses are different from the API keys Soda Core uses to connect to Soda Cloud.

  1. If you have not already done so, create a Soda Cloud account at cloud.soda.io.
  2. In your Soda Cloud account, navigate to your avatar > Scans & Data > the Agents tab, then click the New Soda Agent.
  3. The dialog box that appears offers abridged instructions to set up a new Soda Agent from the command-line; more thorough instructions exist in this documentation, below.

    For now, copy and paste the values for both the API Key ID and API Key Secret to a temporary, secure place in your local environment. You will need these values in the next section when you deploy the agent in your Kubernetes cluster.
    deploy-agent
  4. You can keep the dialog box open in Soda Cloud, or close it.

Create an AKS cluster

The following procedures use Azure CLI to create a resource group and cluster. Alternatively, you tackle the same tasks using PowerShell or the Azure portal, if you prefer.

There are two ways to create a cluster:

Create a regular cluster

  1. To create a cluster, you must first create a resource group which belongs to a single location. Use the following command to list the available locations for your Azure subscription and record the name of the one that best matches your location.
    az account list-locations -o table
    
    DisplayName               Name                 RegionalDisplayName
    ------------------------  -------------------  -------------------------------------
    East US                   eastus               (US) East US
    East US 2                 eastus2              (US) East US 2
    South Central US          southcentralus       (US) South Central US
    West US 2                 westus2              (US) West US 2
    West US 3                 westus3              (US) West US 3
    Australia East            australiaeast        (Asia Pacific) Australia East
    ...
    
  2. Use the following command to list the resource groups that already exist. Resource Groups are logical collections of resources that you deploy in Azure. Your role in your Azure environment dictates whether you can use an existing resource group or if you need to create a new one. The instructions that follow assume the latter.
    az group list --output table
    
  3. From the command-line, create a new resource group using the following command, replacing the value of --location with your own relevant value:
    az group create --name SodaAgent --location westeurope
    
  4. Create an AKS cluster using the following command:
    az aks create \
      --resource-group SodaAgent \
      --name SodaAgentCluster \
      --node-count 1 \
      --generate-ssh-keys
    
  5. Add the cluster credentials to your kubectl configuration so you can run kubectl and helm commands against this cluster.
    az aks get-credentials \
      --resource-group SodaAgent \
      --name SodaAgentCluster
    
    Merged "SodaAgentCluster" as current context in /Users/my_name/.kube/config
    
  6. Your default kubectl configuration now points to the newly-added cluster context. Run the following command to check the nodes in the cluster.
    kubectl get nodes
    
    NAME                                STATUS   ROLES   AGE   VERSION
    aks-nodepool1-15273607-vmss000000   Ready    agent   40m   v1.23.12
    

Create a virtual cluster

Use Azure’s ACI (Azure Container Instances) to create Kubernetes pods for a serverless approach. The virtual nodes make use of a virtual network allowing the pods in the cluster to communicate. Only AKS clusters configured with advanced networking can use virtual nodes. Read more.

  1. From the command-line, verify if the Azure Container Instances (ACI) provider is enabled for your subscription.
    > az provider list --query "[?contains(namespace,'Microsoft.ContainerInstance')]" -o table
    
    Namespace                    RegistrationState    RegistrationPolicy
    ---------------------------  -------------------  --------------------
    Microsoft.ContainerInstance  Registered           RegistrationRequired
    
  2. If the command yields a NotRegistered state, use this command to enable the provider, assuming you have sufficient privileges.
    az provider register --namespace Microsoft.ContainerInstance
    
  3. Use the following command to list the available locations for your Azure subscription and record the name of the one that best matches your location.
    az account list-locations -o table
    
    DisplayName               Name                 RegionalDisplayName
    ------------------------  -------------------  -------------------------------------
    East US                   eastus               (US) East US
    East US 2                 eastus2              (US) East US 2
    South Central US          southcentralus       (US) South Central US
    West US 2                 westus2              (US) West US 2
    West US 3                 westus3              (US) West US 3
    Australia East            australiaeast        (Asia Pacific) Australia East
    ...
    
  4. Use the following command to list the resource groups that already exist. Resource Groups are logical collections of resources that you deploy in Azure. Your role in your Azure environment dictates whether you can use an existing resource group or if you need to create a new one. The instructions that follow assume the latter.
    az group list --output table
    
  5. From the command-line, create a new resource group using the following command, replacing the value of --location with your own relevant value:
    az group create --name SodaAgent --location westeurope
    
  6. Use the following command to create a virtual network. You will use this network and a subnet (see next steps) to set up virtual nodes.
    az network vnet create \
      --resource-group SodaAgent \
      --name SodaAgentVnet \
      --address-prefixes 10.100.0.0/16 \
      --subnet-name SodaAgentSubnet \
      --subnet-prefix 10.100.100.0/24
    
  7. Note the ID of the subnet which you just created, or use the following command to find it and store it in a variable. When you wish to decommission the agent and its cluster, you need the subnet and vnet IDs.
    subnetid=$(az network vnet subnet show \
      --resource-group SodaAgent \
      --vnet-name SodaAgentVnet \
      --name SodaAgentSubnet \
      --query id -o tsv)
    
  8. Create an additional subnet for the virtual nodes.
    az network vnet subnet create \
      --resource-group SodaAgent \
      --vnet-name SodaAgentVnet \
      --name SodaAgentVirtualNodeSubnet \
      --address-prefixes 10.100.101.0/24
    
  9. Use the following command to create the cluster. Note that the node count is set to 1, which represents the number of nodes the control plane parts of the AKS cluster use, meaning the cluster is not entirely serverless. For production deployments, best practice dictates that you use two or three nodes.
    Be patient as the command can take several minutes to complete.
    az aks create \
      --resource-group SodaAgent \
      --name SodaAgentCluster \
      --node-count 1 \
      --network-plugin azure \
      --vnet-subnet-id $subnetid
    
  10. Use the following command to list the enabled add-ons for the cluster.
    az aks addon list --resource-group SodaAgent --name SodaAgentCluster
    
  11. Activate the virtual nodes add-on.
    az aks enable-addons \
    --resource-group SodaAgent \
    --name SodaAgentCluster \
    --addons virtual-node \
    --subnet-name SodaAgentVirtualNodeSubnet
    
  12. Add the cluster credentials to your kubectl configuration so you can run kubectl and helm commands against this cluster.
    az aks get-credentials \
      --resource-group SodaAgent \
      --name SodaAgentCluster
    
  13. Your default kubectl configuration now points to the newly-added cluster context. Run the following command to check the nodes in the cluster.
    kubectl get nodes
    
    NAME                                STATUS   ROLES   AGE   VERSION
    aks-nodepool1-15273607-vmss000000   Ready    agent   40m   v1.23.12
    
  14. Create a namespace for the agent.
    kubectl create ns soda-agent
    
  15. Run the following command to change the context to associate the current namespace to soda-agent.
    kubectl config set-context --current --namespace=soda-agent
    
  16. Run the following command to verify that the cluster kubectl recognizes soda-agent as the current namespace.
    kubectl config get-contexts
    

    Output:

    CURRENT   NAME                CLUSTER              AUTHINFO                                 NAMESPACE
    *         SodaAgentCluster    SodaAgentCluster     clusterUser_SodaAgent_SodaAgentCluster   soda-agent
    

Deploy a Soda Agent

The following table outlines the ways you can install the Helm chart to deploy a Soda Agent in your cluster.

Method Description When to use
CLI only - regular cluster Install the Helm chart via CLI by providing values directly in the install command. Use this as a straight-forward way of deploying an agent on a cluster.
CLI only - virtual cluster Install the Helm chart via CLI by providing values directly in the install command. Use this as a straight-forward way of deploying an agent on a virtual cluster.
Use a values YAML file Install the Helm chart via CLI by providing values in a values YAML file. Use this as a way of deploying an agent on a cluster while keeping sensitive values secure.
- provide sensitive API key values in this local file
- store data source login credentials as environment variables in this local file; Soda needs access to the credentials to be able to connect to your data source to run scans of your data. See: Manage sensitive values.

Deploy using CLI only - regular cluster

  1. Use Helm to add the Soda Agent Helm chart repository.
    helm repo add soda-agent https://helm.soda.io/soda-agent/
    
  2. Use the following command to install the Helm chart which deploys a Soda Agent in your cluster. (Learn more about the helm install command.)
    • Replace the values of soda.apikey.id and soda-apikey.secret with the values you copy+pasted from the New Soda Agent dialog box in your Soda Cloud. The cluster stores these key values as Kubernetes secrets.
    • Replace the value of soda.agent.name with a custom name for your agent, if you wish.
    • Optionally, add the soda.core settings to configure idle workers in the cluster. Launch an idle worker so at scan time, the agent can hand over instructions to an already running idle Scan Launcher to avoid the start-from-scratch setup time for a pod. This helps your test scans from Soda Cloud run faster. You can have multiple idle scan launchers waiting for instructions.
      helm install soda-agent soda-agent/soda-agent \
       --set soda.agent.name=myuniqueagent \
       --set soda.apikey.id=*** \
       --set soda.apikey.secret=**** \
       --set soda.core.idle=true \
       --set soda.core.replicas=1 \
       --namespace soda-agent
      

      The command-line produces output like the following message:

      NAME: soda-agent
      LAST DEPLOYED: Mon Nov 21 16:29:38 2022
      NAMESPACE: soda-agent
      STATUS: deployed
      REVISION: 1
      
  3. (Optional) Validate the Soda Agent deployment by running the following command:
    kubectl get pods -n soda-agent
    
    NAME                                     READY   STATUS    RESTARTS   AGE
    soda-agent-orchestrator-ffd74c76-5g7tl   1/1     Running   0          32s
    
  4. In your Soda Cloud account, navigate to your avatar > Scans & Data > Agents tab. Refresh the page to verify that you see the agent you just created in the list of Agents.

    Be aware that this may take several minutes to appear in your list of Soda Agents. agent-deployed
  5. Next: Add a data source in Soda Cloud using the Soda Agent you just deployed. If you wish, you can create a practice data source so you can try adding a data source in Soda Cloud using the Soda Agent you just deployed.

Deploy using CLI only - virtual cluster

  1. Use Helm to add the Soda Agent Helm chart repository.
    helm repo add soda-agent https://helm.soda.io/soda-agent/
    
  2. Create a namespace for the agent.
    kubectl create ns soda-agent
    
    namespace/soda-agent created
    
  3. Use the following command to install the Helm chart which deploys a Soda Agent in your cluster. (Learn more about the helm install command.)
    • Replace the values of soda.apikey.id and soda-apikey.secret with the values you copy+pasted from the New Soda Agent dialog box in your Soda Cloud. The cluster stores these key values as Kubernetes secrets.
    • Replace the value of soda.agent.name with a custom name for your agent, if you wish.
    • Optionally, add the soda.core settings to configure idle workers in the cluster. Launch an idle worker so at scan time, the agent can hand over instructions to an already running idle Scan Launcher to avoid the start-from-scratch setup time for a pod. This helps your test scans from Soda Cloud run faster. You can have multiple idle scan launchers waiting for instructions.
      helm install soda-agent soda-agent/soda-agent \
       --set soda.agent.target=azure-aks-virtualnodes \
       --set soda.agent.name=myuniqueagent \
       --set soda.apikey.id=*** \
       --set soda.apikey.secret=**** \
       --set soda.core.idle=true \
       --set soda.core.replicas=1 \
       --namespace soda-agent
      

      The command-line produces output like the following message:

      NAME: soda-agent
      LAST DEPLOYED: Mon Nov 21 16:29:38 2022
      NAMESPACE: soda-agent
      STATUS: deployed
      REVISION: 1
      
  4. (Optional) Validate the Soda Agent deployment by running the following command:
    kubectl get pods -n soda-agent
    
    NAME                                     READY   STATUS    RESTARTS   AGE
    soda-agent-orchestrator-ffd74c76-5g7tl   1/1     Running   0          32s
    
  5. In your Soda Cloud account, navigate to your avatar > Scans & Data > Agents tab. Refresh the page to verify that you see the agent you just created in the list of Agents.

    Be aware that this may take several minutes to appear in your list of Soda Agents. agent-deployed
  6. Next: Add a data source in Soda Cloud using the Soda Agent you just deployed. If you wish, you can create a practice data source so you can try adding a data source in Soda Cloud using the Soda Agent you just deployed.

Deploy using a values YAML file

  1. Use Helm to add the Soda Agent Helm chart repository.
    helm repo add soda-agent https://helm.soda.io/soda-agent/
    
  2. Using a code editor, create a new YAML file called values.yml.
  3. To that file, copy+paste the content below, replacing the following values:
    • id and secret with the values you copy+pasted from the New Soda Agent dialog box in your Soda Cloud account.
    • Replace the value of name with a custom name for your agent, if you wish.
    • Optionally, add the soda.core settings to configure idle workers in the cluster. Launch an idle worker so at scan time, the agent can hand over instructions to an already running idle Scan Launcher to avoid the start-from-scratch setup time for a pod. This helps your test scans from Soda Cloud run faster. You can have multiple idle scan launchers waiting for instructions.
      soda:
         apikey:
           id: "***"
           secret: "***"
         agent:
           name: "myuniqueagent"
         core:
           idle: true
           replicas: 1
      
  4. Save the file. Then, create a namespace for the agent.
    kubectl create ns soda-agent
    
    namespace/soda-agent created
    
  5. In the same directory in which the values.yml file exists, use the following command to install the Soda Agent helm chart.
    helm install soda-agent soda-agent/soda-agent \
      --values values.yml \
      --namespace soda-agent
    
  6. (Optional) Validate the Soda Agent deployment by running the following command:
    kubectl -- describe pods
    
  7. In your Soda Cloud account, navigate to your avatar > Scans & Data > Agents tab. Refresh the page to verify that you see the agent you just created in the list of Agents. agent-deployed
  8. Next: Add a data source in Soda Cloud using the Soda Agent you just deployed.

(Optional) Create a practice data source

If you wish to try creating a new data source in Soda Cloud using the agent you deployed, you can use the following command to create a PostgreSQL warehouse containing example data from the NYC Bus Breakdowns and Delay Dataset.

From the command-line, copy+paste and run the following to create the data source as a pod on your new cluster.

cat <<EOF | kubectl apply -n soda-agent -f -
---
apiVersion: v1
kind: Pod
metadata:
  name: nybusbreakdowns
  labels:
    app: nybusbreakdowns
spec:
  containers:
  - image: sodadata/nybusbreakdowns
    imagePullPolicy: IfNotPresent
    name: nybusbreakdowns
    ports:
    - name: tcp-postgresql
      containerPort: 5432
  restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: nybusbreakdowns
  name: nybusbreakdowns
spec:
  ports:
  - name: tcp-postgresql
    port: 5432
    protocol: TCP
    targetPort: tcp-postgresql
  selector:
    app: nybusbreakdowns
  type: ClusterIP
EOF

Output:

pod/nybusbreakdowns created
service/nybusbreakdowns created


Once the pod of practice data is running, you can use the following configuration details when you add a data source in Soda Cloud, in step 2, Connect the Data Source.

data_source your_datasource_name:
  type: postgres
  connection:
    host: nybusbreakdowns
    port: 5432
    username: sodacore
    password: sodacore
    database: sodacore
    schema: new_york

About the helm install command

helm install soda-agent soda-agent/soda-agent \
  --set soda.agent.target=azure-aks-virtualnodes \
  --set soda.agent.name=myuniqueagent \
  --set soda.apikey.id=*** \
  --set soda.apikey.secret=**** \
  --namespace soda-agent
Command part Description
helm install the action helm is to take
soda-agent (the first one) a release named soda-agent on your cluster
soda-agent (the second one) the name of the helm repo you installed
soda-agent (the third one) the name of the helm chart that is the Soda Agent

The --set options either override or set some of the values defined in and used by the Helm chart. You can override these values with the --set files as this command does, or you can specify the override values using a values.yml file.

Parameter key Parameter value, description
--set soda.agent.target (Optional) The cluster the command target. Use when deploying to aws-eks or azure-aks-virtualnodes.
--set soda.agent.name A unique name for your Soda Agent. Choose any name you wish, as long as it is unique in your Soda Cloud account.
--set soda.apikey.id With the apikey.secret, this connects the Soda Agent to your Soda Cloud account. Use the value you copied from the dialog box in Soda Cloud when adding a new agent. You can use a values.yml file to pass this value to the cluster instead of exposing it here.
--set soda.apikey.secret With the apikey.id, this connects the Soda Agent to your Soda Cloud account. Use the value you copied from the dialog box in Soda Cloud when adding a new agent. You can use a values.yml file to pass this value to the cluster instead of exposing it here.
--set soda.core.idle=true (Optional) Launch an idle worker so at scan time, the agent can hand over instructions to an already running idle Scan Launcher to avoid the start-from-scratch setup time for a pod. You can have multiple idle scan launchers waiting for instructions.
--set soda.core.replicas=1 (Optional) Replicate an idle worker to have more workers ready to handle instructions without setting up a new pod.
--namespace soda-agent Use the namespace value to identify the namespace in which to deploy the agent.


Decommission the Soda Agent and the AKS cluster

  1. Delete everything in the namespace which you created for the Soda Agent.
    kubectl delete ns soda-agent
    
  2. Delete the cluster. Be patient; this task may take some time to complete.
    az aks delete --resource-group SodaAgent --name soda-agent-cli-test --yes
    
  3. If you created an additional subnet for virtual nodes, delete the subnet. The subnet and vnet values must match the names you used during deployment.
    az network vnet subnet delete \
      --resource-group SodaAgent \
      --name SodaAgentVirtualNodeSubnet \
      --vnet-name SodaAgentVnet
    

Troubleshoot deployment

Refer to Helpful kubectl commands for instructions on accessing logs, etc.


Problem: Scans launched from Soda Cloud take an excessive amount of time to run.

Solution: Consider adjusting the number of replicas for idle workers with kubectl. Launch extra idle workers so at scan time, the agent can hand over instructions to an already running idle Scan Launcher to avoid the start-from-scratch setup time for a pod.

  1. Ensure that the agent was deployed with the soda.core configurations for idle: true and replicas: 1 or more.
  2. Run the following command to increase the number of active replicas to 2.
    kubectl scale deployment/soda-agent-scanlauncher \
      --replicas 2 -n soda-agent
    


Problem: When you attempt to create a cluster, you get an error that reads, An RSA key file or key value must be supplied to SSH Key Value. You can use --generate-ssh-keys to let CLI generate one for you.

Solution: Run the same command to create a cluster but include an extra line at the end to generate RSA keys.

az aks create \
>   --resource-group SodaAgent \
>   --name SodaAgentCluster \
>   --node-count 1 \
>   --generate-ssh-keys

Go further


Was this documentation helpful?

What could we do to improve this page?


Last modified on 26-Jan-23