> For the complete documentation index, see [llms.txt](https://docs.soda.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.soda.io/deployment-options/soda-runner.md).

# Soda Runner

{% hint style="danger" %}
**Nomenclature change: "Agent" is being updated to "Runner"**

The Soda Agent is being renamed to [**Soda Runner**](/reference/soda-runner-basic-concepts.md) across all Soda products and interfaces. **This change will soon affect deployment terminology**, and includes:

* Soda Cloud UI labels
* API permission names: `MANAGE_DATASOURCES_AND_AGENTS` → `MANAGE_DATASOURCES_AND_RUNNERS`
* CLI flags: `--use-agent` → `--use-runner`
* Python API methods: `verify_contract_on_agent` → `verify_contract_on_runner`

Deployment options still reference "agent", but will shortly be changed to "runner" in the upcoming weeks.
{% endhint %}

The **Soda Runner** is a tool that empowers Soda Cloud users to **securely access data sources** to scan for data quality. For a self-hosted runner, **create a Kubernetes cluster** in a cloud services provider environment, then use Helm to deploy a Soda Runner in the cluster.

This setup enables Soda Cloud users to securely connect to data sources (Snowflake, Amazon Athena, etc.) **from within the Soda Cloud web application**. Any user in your Soda Cloud account can add a new data source via the runner, then write their own no-code checks to check for data quality in the new data source.

When you deploy an runner, you also deploy **two types of workloads** in your Kubernetes cluster from a Docker image:

* a **Soda Runner Orchestrator** which creates Kubernetes Jobs to trigger scheduled and on-demand scans of data
* a **Soda Runner Scan Launcher** which wraps around the Soda Python Libraries, which implement the scans.
* a **Soda Runner Contract Launcher** which wraps around the Soda Core Python libraries, providing data contract functionality.

<figure><img src="/files/6gx7IAj9Nb6ie278Ija5" alt=""><figcaption></figcaption></figure>

### How does Soda integrate with Kubernetes?

**Kubernetes** is a system for orchestrating containerized applications; a **Kubernetes cluster** is a set of resources that supports an application deployment.

You need a Kubernetes cluster in which to deploy the containerized applications that make up the **Soda Runner**. Kubernetes uses the concept of [**Secrets**](https://kubernetes.io/docs/concepts/configuration/secret/) that the Soda Runner Helm chart employs to store connection secrets that you specify as values during the Helm release of the Soda Runner. Depending on your cloud provider, you can arrange to store these Secrets in specialized storage such as [Azure Key Vault](https://learn.microsoft.com/en-us/azure/key-vault/general/basic-concepts), [AWS Key Management Service (KMS)](https://docs.aws.amazon.com/kms/latest/developerguide/overview.html), or [AWS Cloud Secrets Management (CSM)](https://aws.amazon.com/secrets-manager/).

> Learn more about [using external secrets](broken://pages/0Cs2lWTZr1mIVYrDs9Y0#using-existing-external-secrets).

The Jobs that the runner creates access these Secrets when they execute.

> Learn more about [Kubernetes concepts](https://www.youtube.com/watch?v=BOj1sgWVXko).

### Where can a Soda Runner be deployed?

Within a cloud services provider environment is *where* you create your Kubernetes cluster. You can deploy a Soda Runner in **any environment in which you can create Kubernetes clusters**, such as:

* Amazon Elastic Kubernetes Service (EKS)
* Microsoft Azure Kubernetes Service (AKS)
* Google Kubernetes Engine (GKE)
* Any Kubernetes cluster version 1.21 or greater which uses standard Kubernetes
* Locally, for testing purposes, using tools like [Minikube](https://minikube.sigs.k8s.io/docs/), [microk8s](https://microk8s.io/docs), [kind](https://kind.sigs.k8s.io/), [k3s](https://docs.k3s.io/), or [Docker Desktop](https://www.docker.com/products/docker-desktop/) with Kubernetes support.

### What is Helm?

**Helm** is a **package manager for Kubernetes** which bundles YAML files together for storage in a public or private repository. This bundle of YAML files is referred to as a **Helm chart**. <mark style="background-color:$primary;">The Soda Runner is a Helm chart</mark>. Anyone with access to the Helm chart’s repo can deploy the chart to make use of YAML files in it.

> Learn more about [Helm concepts](https://www.youtube.com/watch?v=-ykwb1d0DXU).

The Soda Runner Helm chart is stored on a [public repository](https://helm.soda.io/soda-agent/) and published on [ArtifactHub.io](https://artifacthub.io/packages/helm/soda-agent/soda-agent). Anyone can use Helm to find and deploy the Soda Runner Helm chart in their Kubernetes cluster.

### Why Kubernetes?

Kubernetes is the most **powerful** and **future-proof** platform for running the Soda Runner because it delivers the best of both worlds: the **flexibility of raw compute** without the operational burden, and the **scalability of managed services** without their restrictions.

* Kubernetes goes far beyond raw compute like EC2 or traditional Virtual Machines (VMs) by **abstracting away the heavy lifting of networking, deployments, and scaling**, while still giving teams precise control when needed. Practically, this makes it easy for Soda’s customers to [deploy, manage, and upgrade Soda Runners](broken://pages/DVzqeL43wc3c9AsZ4TFi) using [Kubernetes](#how-does-soda-integrate-with-kubernetes) and [Helm](#what-is-helm), always staying up to date with the latest releases.
* Unlike fully managed options such as AWS Lambda, **Kubernetes has no execution time limits** and is built to handle long-running, stateful, and highly scalable workloads. This means **Soda is not limited to lightweight samples** but can perform complete, row-level operations—powering advanced capabilities like the Diagnostics Warehouse, which securely stores the exact failing records inside your own infrastructure, and [Reconciliation Checks](broken://pages/QkO7w20yeaFP3U9Ow5Kk), which compare data at row-level across sources.

Whether running in the cloud or on-premises, Kubernetes ensures resilience, portability, and cost-efficient resource use, making it the clear choice for complex, enterprise-grade data quality workloads.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.soda.io/deployment-options/soda-runner.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.