# Migrate from v3 to v4

This guide details the steps required to migrate datasets from **v3** to **v4 contracts.**

The migration needs to be enabled by the Soda team. Contact us to ensure that the feature flag (`v3ToV4DatasetMigrationEnabled`) is enabled for your organization.

{% hint style="warning" %}
This action requires the global **Admin** role and can be [enabled for users with **Manage contract** permission](#enable-dataset-owners-to-migrate-datasets-from-v3-v4). Learn more about permissions here: [Broken mention](broken://pages/n914KXqJNBoZ19Ay5W0P)
{% endhint %}

## Using Soda Cloud

### Start the migration

The migration can be performed in two ways:

1. **Bulk migration:** On the datasets pag&#x65;**,** you can select multiple datasets to migrate them all at once.
2. **Single migration:** On a dataset page, you can migrate the dataset individually.

<div><figure><img src="/files/ILCxdvbTsOkiBZjlVAON" alt=""><figcaption></figcaption></figure> <figure><img src="/files/h1RSrvpNL460OsTrnJJn" alt=""><figcaption></figcaption></figure></div>

Once initiated, Soda guides you through each step of the process to ensure a safe and transparent migration.

### Select a v4 data source

In the first step, you’ll select which **v4 data source** the dataset(s) should be migrated to.

1. Choose an existing **v4 data source** from the dropdown list.
2. Or click **Add a new v4 Data Source** to create a new one.
3. Click **Next**

{% hint style="warning" %}
You can only migrate v3 datasets to a **data source of the same type**. For example, from Snowflake to Snowflake.

The migration process in Soda Cloud requires a **valid connection** configuration to your data source **through Soda Agent**.
{% endhint %}

<figure><img src="/files/wGz2suj4LekqJJWXqIkM" alt=""><figcaption></figcaption></figure>

### Configure migration settings

In this step, you can customize optional settings before running the migration.

**Available configurations:**

* **Contract Schedule:** Set a schedule for automatic contract verification. [Broken mention](broken://pages/adn2iQZu7t1aWGfkGzSJ)
* **Migrate History:** include 90 days of historical check results in the migration.
* **Migrate Responsibilities:** Migrate the dataset's ownership and assigned responsibilities.
* **Migrate Attributes:** Migrate the dataset's attributes.

Once your settings are configured, click **Continue** to proceed.

<figure><img src="/files/PcHYiC4w0ez1SXwzvYs8" alt=""><figcaption></figcaption></figure>

### Review migration

Before migrating, you’ll see a detailed summary of what will be migrated and what won’t.\
Each dataset displays:

* **Number of checks** that were:
  * **Successfully translated:** The behavior of the new checks matches that in v3.
  * **Translated with a warning:** Translated, but not an exact one-to-one match. Review recommended.
  * **Not translated:** Checks are either not supported in v4 or cannot be translated. See [#current-limitations](#current-limitations "mention")
* For each check with a warning or that was not translated, you can view the **definition of the v3 check,** along with an **indicative remark** explaining the reason or impact.

<figure><img src="/files/jXxgHs9got7vU40EWNk4" alt=""><figcaption></figcaption></figure>

Click the **eye icon** to open a preview of the generated contract before finalizing the migration.It lets you verify the contract structure, filters, variables, and checks. You can test it directly from this view to ensure everything runs as expected before confirming migration.

<div><figure><img src="/files/s8z1yqykbHI4oMog4zfw" alt=""><figcaption></figcaption></figure> <figure><img src="/files/wkGIvlSU1UsHGW45NncS" alt=""><figcaption></figcaption></figure></div>

### Complete migration

After reviewing, use the checkboxes to deselect any datasets that are not ready for migration, then click **Migrate** to finalize the process.

This action starts a **background process** to migrate the selected datasets. The migration may take a few seconds or minutes, depending on the volume being migrated, to complete.

<figure><img src="/files/LcWuLFZbZhIXbxX86W8Z" alt=""><figcaption></figcaption></figure>

Once started, you can navigate to the **v3 datasets** to monitor progress.\
Refresh the page after a few moments to view the migration results.

<figure><img src="/files/wfzGZqVS8H79Wa1bJT8z" alt=""><figcaption></figcaption></figure>

### Post-migration results

Once migration is complete:

* Your original **v3 dataset** remains accessible but marked as *migrated*.
* Your new **v4 dataset** is fully active and ready for contract-based validation.

The v3 dataset page shows its migration status as **Completed migration**.

In the v4 dataset view, you can now see:

* Migrated checks under **Dataset checks** and **Column checks**.
* Contract verification results and coverage metrics.
* The option to edit or version your new data contract.

<figure><img src="/files/1tAhadhC80GctU1ngbke" alt=""><figcaption></figcaption></figure>

## Using Soda CLI

### Install the migration CLI

In a virtual environment, install the Soda migration package as well as the Soda Core package for your data source (see [Data source reference for Soda Core](/reference/data-source-reference-for-soda-core.md)).

{% hint style="warning" %}
The migration process requires having installed **the `soda-migration` package using the** [**private PyPI**](#private-pypi-installation-flow) with a **Team** or **Enterprise license.**

Need access to the PyPI repository ? Please [contact us](https://www.soda.io/contact).
{% endhint %}

Choose your organization host to install the migration package:

{% tabs %}
{% tab title="Team EU" %}
{% code overflow="wrap" %}

```bash
pip install soda-migration -i "https://<api_key_id>:<api_key_secret>@team.pypi.cloud.soda.io"
```

{% endcode %}
{% endtab %}

{% tab title="Team US" %}
{% code overflow="wrap" %}

```bash
pip install soda-migration -i "https://<api_key_id>:<api_key_secret>@team.pypi.cloud.us.soda.io"
```

{% endcode %}
{% endtab %}

{% tab title="Enterprise EU" %}
{% code overflow="wrap" %}

```bash
pip install soda-migration -i "https://<api_key_id>:<api_key_secret>@enterprise.pypi.cloud.soda.io"
```

{% endcode %}
{% endtab %}

{% tab title="Enterprise US" %}
{% code overflow="wrap" %}

```bash
pip install soda-migration -i "https://<api_key_id>:<api_key_secret>@enterprise.pypi.cloud.us.soda.io"
```

{% endcode %}
{% endtab %}
{% endtabs %}

Next to the migration package, **it is also required to install the necessary package to connect to your data source**. See [Data source reference for Soda Core](/reference/data-source-reference-for-soda-core.md). This is required because Soda connects to your data source to generate the contract skeleton before translating existing checks.

### Configuration

Create a `migration.json` file to map v3 dataset IDs to v4 DQNs (Dataset Qualified Names). This file defines which datasets to migrate and their corresponding new DQNs.\
\
Example structure:

```json
[
    {
        "dqn": "v4_data_source/postgres/public/dim_employee",
        "v3DatasetId": "6d706754-4d0d-4aa0-9c7d-25fbe07cb3f6"
    },
    {
        "dqn": "v4_data_source/postgres/public/dim_employee",
        "v3DatasetId": "6d706754-4d0d-4aa0-9c7d-25fbe07cb3f6"
    }
]
```

> Learn more about DQNs in [Contract Language reference](/reference/contract-language-reference.md#dataset-fully-qualified-name)

#### Retrieve the v3 dataset IDs

To retrieve the v3 dataset IDs, you want to migrate from Soda Cloud:

* Use [Broken mention](broken://pages/35f6a3d29f47259ce552d7bad6bb2b5f7477c480#get-api-v1-datasets) to fetch the datasets' information and their IDs
* The IDs can also be found in the dataset URL for a given dataset.

<details>

<summary>Example Python script to generate the configuration file</summary>

```python
#!/usr/bin/env python3
"""
Soda Cloud Dataset Migration Configuration Generator

This script fetches dataset information from Soda Cloud API and generates
a migration.json file for dataset migration. This file is used to migrate
datasets from V3 to V4.
"""

import argparse
import base64
import json
import os
import subprocess
import sys
from dotenv import load_dotenv


def generate_basic_auth_token(api_key, api_secret):
    """Generate Basic Auth token from API key and secret."""
    token = f"{api_key}:{api_secret}"
    return base64.b64encode(token.encode()).decode()


def execute_curl_with_auth(url, auth_token):
    """Execute curl command with Basic Auth and return JSON response."""
    try:
        headers = f"Authorization: Basic {auth_token}"
        command = ['curl', '-s', url, '--header', headers]
        result = subprocess.run(command, capture_output=True, text=True, check=True)
        return json.loads(result.stdout)
    except subprocess.CalledProcessError as e:
        print("Curl failed with error:\n", e.stderr)
        return None
    except json.JSONDecodeError as e:
        print(f"Failed to parse JSON response: {e}")
        return None


def fetch_datasets(api_key, api_secret, v3_data_source_name=None):
    """Fetch datasets from Soda Cloud API."""
    url = "https://cloud.soda.io/api/v1/datasets"
    params = []
    if v3_data_source_name:
        params.append(f"datasourceName={v3_data_source_name}")
    params.append("size=1000")
    if params:
        url += "?" + "&".join(params)
    
    auth_token = generate_basic_auth_token(api_key, api_secret)
    
    print(f"Fetching datasets from {url}...")
    response = execute_curl_with_auth(url, auth_token)
    
    if response is None:
        print("Failed to fetch datasets from API")
        return None
    
    if "content" not in response:
        print("Unexpected API response format")
        return None
    
    return response["content"]


def generate_migration_data(datasets, v4_data_source_name):
    """Generate migration data from datasets."""
    migration_data = []
    
    for dataset in datasets:
        migration_entry = {
            "v3DatasetId": dataset["id"],
            "dqn": v4_data_source_name + "/" + dataset["datasource"]["prefix"].replace(".", "/") + "/" + dataset["name"]
        }
        migration_data.append(migration_entry)
    
    return migration_data


def save_migration_file(migration_data, output_path):
    """Save migration data to JSON file."""
    try:
        with open(output_path, 'w') as f:
            json.dump(migration_data, f, indent=2)
        print(f"Migration file saved to: {output_path}")
        return True
    except Exception as e:
        print(f"Failed to save migration file: {e}")
        return False


def main():
    """Main application entry point."""
    parser = argparse.ArgumentParser(
        description="Generate Soda Cloud dataset migration file"
    )
    parser.add_argument(
        "v4_data_source_name",
        help="V4 data source name for migration"
    )
    parser.add_argument(
        "v3_data_source_name",
        help="V3 data source name to filter datasets (used as query parameter)"
    )
    parser.add_argument(
        "--output",
        "-o",
        default="migration.json",
        help="Output file path (default: migration.json)"
    )
    parser.add_argument(
        "--env-file",
        default="../.env",
        help="Environment file path (default: ../.env)"
    )
    
    args = parser.parse_args()
    
    # Load environment variables
    load_dotenv(args.env_file)
    
    # Get API credentials from environment
    api_key = os.getenv("SODA_CLOUD_API_KEY")
    api_secret = os.getenv("SODA_CLOUD_API_SECRET")
    
    if not api_key or not api_secret:
        print("Error: SODA_CLOUD_API_KEY and SODA_CLOUD_API_SECRET must be set in environment")
        print("You can set them in a .env file or as environment variables")
        sys.exit(1)
    
    print(f"Generating migration for data source: {args.v4_data_source_name}")
    print(f"Filtering datasets by V3 data source: {args.v3_data_source_name}")
    
    # Fetch datasets from API
    datasets = fetch_datasets(api_key, api_secret, args.v3_data_source_name)
    if datasets is None:
        sys.exit(1)
    
    print(f"Found {len(datasets)} datasets")
    
    # Generate migration data
    migration_data = generate_migration_data(datasets, args.v4_data_source_name)
    
    # Save to file
    if save_migration_file(migration_data, args.output):
        print("Migration generation completed successfully!")
    else:
        sys.exit(1)


if __name__ == "__main__":
    main()

```

</details>

***

### Generate contracts

Run the following command, replacing the paths and file names with your setup. Note that this action requires a valid connection configuration to your data source and Soda Cloud.

```sh
soda contract-migrator generate-bulk \
--bulk-config-file migration.json \
--output-directory contracts/ \
--v4-data-source ds.yml \
--soda-cloud soda-cloud.yml \
--schedule "0 0 * * *" \
-v
```

#### Parameters

| **Parameter**        | **Required** | **Description**                                                                                                                                                                                                                          |
| -------------------- | ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--bulk-config-file` | Yes          | Path to the JSON configuration file (`bulk.json`) that maps v3 dataset IDs to v4 DQNs.                                                                                                                                                   |
| `--output-directory` | Yes          | Directory where the generated v4 contracts will be saved. The structure within this location mirrors the DQN hierarchy.                                                                                                                  |
| `--v4-data-source`   | Yes          | <p>Path to the local v4 datasource YAML file (e.g., <code>ds.yml</code>). Used to retrieve schema metadata during contract generation.<br><br>See <a data-mention href="/pages/Ovp9NbJpqBhQz07AImSa">/pages/Ovp9NbJpqBhQz07AImSa</a></p> |
| `--soda-cloud`       | Yes          | Configuration required to connect to Soda Cloud.                                                                                                                                                                                         |
| `--schedule`         | No           | Optional cron expression defining the schedule for the generated contracts (e.g., `"0 0 * * *"`).                                                                                                                                        |
| `--verbose`          | No           | Enables verbose output for detailed logs during the migration process.                                                                                                                                                                   |

### Review migration

Ensure that each generated contract includes:

* Correct `v4_dqn` in the `dataset` property
* Correct `v3_dataset_id` property referencing v3 dataset
* Correct **columns and types**
* Checks migrated from `sodacl` into `contracts` checks
* v3 check IDs present in the `qualifier` field for each check
* Accurate **check filters**, expressions, and metadata

See [#current-limitation](#current-limitation "mention") to know which checks cannot be automatically migrated yet. Those checks can still be added manually, and the history can be migrated by setting a v3 check ID in the `qualifier`

### Complete migration

Once contracts are verified, publish them to Soda Cloud with the following command:

```sh
soda contract-migrator publish \
--contract contracts/ \
--soda-cloud soda-cloud.yml \
--migrate-attributes \
--migrate-responsibilities \
--migrate-test-results 
```

#### Parameters

| Parameter                    | Required |                                                                                                                                                                                                                                                                                                                                                                        |
| ---------------------------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--contract`                 | Yes      | <p>Specifies the path to the folder containing the contracts or a specific contract file to be migrated.<br><br>This parameter supports <strong>recursive directory traversal</strong>.</p><p><br>⚠️ <strong>Note:</strong> If the folder includes contracts for datasets that have already been migrated, those datasets will be <strong>migrated again</strong>.</p> |
| --`soda-cloud`               | Yes      | Configuration required to connect to Soda Cloud.                                                                                                                                                                                                                                                                                                                       |
| --`migrate-test-results`     | No       | <p>Include test history from v3 to v4. A maximum of 90 days of history is migrated.<br><br>Default is <code>false</code>.</p>                                                                                                                                                                                                                                          |
| --`migrate-responsibilities` | No       | <p>Copy responsibilities from v3 datasets to v4 datasets.<br><br>Default is <code>false</code>.</p>                                                                                                                                                                                                                                                                    |
| --`migrate-attributes`       | No       | <p>Copy attributes from v3 datasets to v4 datasets<br><br>Default is <code>false</code>.</p>                                                                                                                                                                                                                                                                           |
| --`verbose`                  | No       | Enable verbose mode                                                                                                                                                                                                                                                                                                                                                    |

### Post-migration results

After publishing, confirm the following:

* **Logs:** Check migration status (overall and per dataset).
* **UI Overview:** Verify datasets in the Soda Cloud UI.
* **Checks:** Ensure checks appear with full history.
* **Flags:** Confirm presence of the following indicators:
  * **v4 flag** present
  * **v3 link** available
  * **Migration completed** flag present

<figure><img src="/files/RFELj7j8Uloo0fVFfsVU" alt=""><figcaption></figcaption></figure>

## Notes and recommendations

* Migration does not delete v3 datasets; it simply marks them as migrated. Once migration is completed, you will be required to update your pipelines or agreements to stop executing v3 checks. Then you can remove the v3 datasets or the v3 data source to remove all its datasets.
* Verify that your v4 data source has a valid connection configuration before migrating. Soda connects to your data source to generate the contract structure.
* Review the migrated contract in detail before finishing the migration. However, it is possible to re-run the migration if necessary [#re-running-a-migration](#re-running-a-migration "mention")

## Enable dataset owners to migrate datasets from v3→v4

An organization **Admin** can enable users with **Manage Contract** permission to migrate datasets.

{% hint style="info" %}
**Dataset owners** have the **Manage Contract** permission by default.
{% endhint %}

{% stepper %}
{% step %}
Click on your profile and navigate to **Organization Settings**

<figure><img src="/files/JTZ9lZ61TGsnNUUcg01F" alt=""><figcaption></figcaption></figure>
{% endstep %}

{% step %}
Under **Dataset migration**, check the option "Allow users with Manage Contract permission to migrate datasets"

This option is disabled by default.

<figure><img src="/files/TBBbE5hhu7Mg7bERlanJ" alt=""><figcaption></figcaption></figure>
{% endstep %}
{% endstepper %}

Users with permission will now be able to see the migration tool:

<figure><img src="/files/NkxhgFxVzAIWkhXyNU9F" alt=""><figcaption></figcaption></figure>

***

## View and filter migrated datasets

After migration, you can review and manage your datasets from the **Datasets** page in Soda Cloud.

Use the filters to easily identify datasets based on their **migration status** and **version**:

* **Migration status filter** — view datasets that are *Pending*, *In progress*, or *Completed migration*.
* **Version filter** — filter by dataset version (**v3** or **v4**) to focus on datasets still awaiting migration or already upgraded.

This makes it simple to track migration progress, validate completed transitions, and identify datasets that still require attention.

<figure><img src="/files/oD2S2lGIDuxF69UX1yuJ" alt=""><figcaption></figcaption></figure>

## Re-running a migration

Once a dataset has been successfully migrated to version 4 (v4), Soda blocks the migration for the dataset to happen again. To re-run the migration, you will need to delete the v4 dataset and run the migration again.

## How checks are matched

The migration process uses the **qualifier** ( [Contract Language reference](/reference/contract-language-reference.md#check-qualifiers)) field to identify which v3 checks should be migrated. The qualifier values are set to the v3 check IDs.\
\
Because the qualifier is part of the check's identity algorithm, it is important **not to change the qualifier** after migration.\
\
Changing it would result in a **loss of history for the checks** in Soda Cloud.

## Current limitations

### Translation step

#### **Check types**

The following check types are not yet supported:

* [Group By check](/reference/contract-language-reference.md#group-by-check)
* Any [reconciliation](broken://pages/QkO7w20yeaFP3U9Ow5Kk) checks
* Reference check
* Any checks using [anomaly score](/soda-v3/sodacl-reference/anomaly-score.md) or [anomaly detection](/soda-v3/sodacl-reference/anomaly-detection.md)

#### **Dataset filters**

Dataset filters ([/spaces/oV0A6Eua8LUIyWgHxsjf/pages/yBqe6yM8bkyesdvcy0Pd#in-check-vs.-dataset-filters](https://docs.soda.io/reference/spaces/oV0A6Eua8LUIyWgHxsjf/pages/yBqe6yM8bkyesdvcy0Pd#in-check-vs.-dataset-filters "mention") ) are currently not migrated.

#### Column casting

Data Contract does not support casting yet.\
When casting is detected in a check, the check will not be translated and will be excluded from the migration.

#### **Variables**

**Variables in names**

If variables are used in the column name, the check will not be translated and will be excluded from the migration.

Exampl&#x65;**:**

```yaml
checks for ECOMMERCE_ORDERS [test]:
  - missing_count(${ORDER_ID_COL}) = 0:
      name: Must not have null values
```

**Variables default values**

Variables used in SodaCL are automatically added to the data contract. They will not have a default value. The default values can be added manually by the users.

<br>

***

{% if (visitor.claims.plan === 'datasetStandard')%}
{% hint style="success" %}
You are **logged in to Soda** and seeing the **Dataset Standard license** documentation. Learn more about [Documentation access & licensing](/reference/documentation-access-and-licensing.md).
{% endhint %}
{% endif %}

{% if (visitor.claims.plan === 'enterprise')%}
{% hint style="success" %}
You are **logged in to Soda** and seeing the **Team license** documentation. Learn more about [Documentation access & licensing](/reference/documentation-access-and-licensing.md).
{% endhint %}
{% endif %}

{% if (visitor.claims.plan === 'enterpriseUserBased')%}
{% hint style="success" %}
You are **logged in to Soda** and seeing the **Enterprise license** documentation. Learn more about [Documentation access & licensing](/reference/documentation-access-and-licensing.md).
{% endhint %}
{% endif %}

{% if !(visitor.claims.plan === 'enterprise' || visitor.claims.plan === 'enterpriseUserBased' || visitor.claims.plan === 'datasetStandard')%}
{% hint style="info" %}
You are **not logged in to Soda** and are viewing the default public documentation. Learn more about [Documentation access & licensing](/reference/documentation-access-and-licensing.md).
{% endhint %}
{% endif %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.soda.io/reference/migrate-from-v3-to-v4.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
