> For the complete documentation index, see [llms.txt](https://docs.soda.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.soda.io/soda-documentation/soda-v3/soda-cloud-api-v1/api2csv-example.md). # GET checks into CSV files This example Python script uses Pandas and the Soda Cloud API to retrieve information from your Soda Cloud account and prepare two CSV files: * one file contains information about all the **datasets** in your Soda Cloud account * one file contains a list of all the **checks** in your Soda Cloud account, including a column attributes you may have added to checks ✖️ Requires Soda Core Scientific (included in a Soda Runner)\ ✖️ Supported in Soda Core\ ✔️ Supported in Soda Cloud + Soda Library\ ✔️ Supported in Soda Cloud + Soda Runner\
## Prerequisites * You have set up a Soda Cloud account and have connected to a data source via Soda Library or a Soda-hosted or self-hosted Soda Runner. * You have prepared at least one data quality check and have run at least one Soda scan in Soda Cloud. * You have Admin privileges in your Soda Cloud account which gives you permission to access to the account and all its checks and datasets via the API. ## Set up and run 1. Best practice dictates that you use a virtual environment to execute the script. Use the following commands to create and activate a venv, then upgrade pip and install pandas.\ MacOS:
```shell python -m venv .sodaapi source .sodaapi/bin/activate pip install --upgrade pip pip install requests pandas ``` Windows:
```shell python -m venv .sodaapi source .sodaapi\Scripts\activate python.exe -m pip install --upgrade pip python.exe -m pip install requests pandas ``` 2. Create a new Python file called `api2csv.py`, then copy the script below into the file. 3. In the `# Soda Cloud account` section of the file, replace the placeholder values with your own Soda Cloud URL and API key values. See [Generate API keys](/soda-documentation/soda-v3/use-case-guides/api-keys.md) for instructions on how to create new keys. 4. If you wish, change the value of `file_type` in the script to `json` if you prefer that type of output over CSV. 5. Run the script! It saves the new files to the same directory as your `api2csv.py` file. ```shell python api2csv.py ``` Output: ```shell Starting collection of all datasets/checks... Fetching all datasets on page: 0 The following csv file was generated successfully: all_datasets Fetching all checks on page 0 The following csv file was generated successfully: all_checks ```
#### API to CSV script ```python import requests import time import sys import pandas as pd from datetime import datetime # Soda Cloud account soda_cloud_url = 'https://cloud.us.soda.io' # cloud.us.soda.io (US Customers) or cloud.soda.io (EU Customers) soda_apikey = 'XXX' # Soda Cloud API key ID soda_apikey_secret = 'XXX' # Soda Cloud API key secret # The file names and file type to which you want to save the output (CSV or JSON) datasets_filename = 'all_datasets' checks_filename = 'all_checks' file_type = 'csv' # ------------------------------------------------------------------------------ if file_type not in ['csv', 'json']: print("You must input the correct file_type: csv or json. You input "+file_type) sys.exit() print("Starting collection of all datasets/checks...") datasets = [] checks = [] response_datasets = requests.get( soda_cloud_url + '/api/v1/datasets?page=0', auth=(soda_apikey , soda_apikey_secret) ) if response_datasets.status_code == 401 or response_datasets.status_code == 403: print("Unauthorized or Forbidden access. Please check your API keys and/or permissions in Soda.") sys.exit() # Collect all datasets if response_datasets.status_code == 200: dataset_pages = response_datasets.json().get('totalPages') i = 0 while i < dataset_pages: dq_datasets = requests.get( soda_cloud_url + '/api/v1/datasets?page='+str(i), auth=(soda_apikey , soda_apikey_secret)) if dq_datasets.status_code == 200: print("Fetching all datasets on page: "+str(i)) list = dq_datasets.json().get("content") datasets.extend(list) i += 1 elif dq_datasets.status_code == 429: print("API Rate Limit reached when fetching datasets on page: " +str(i)+ ". Pausing for 30 seconds.") time.sleep(30) # Retry fetching the same page else: print("Error fetching datasets on page "+str(i)+". Status code:", dq_datasets.status_code) else: print("Error fetching initial datasets. Status code:", response_datasets.status_code) sys(exit) df_datasets = pd.DataFrame(datasets) current_time = datetime.now().strftime('%Y-%m-%d %I:%M:%S %p') df_datasets.insert(0, 'record_created', current_time) # Create datasets file if file_type == 'csv': df_datasets.to_csv(datasets_filename +'.csv', index=False) if file_type == 'json': df_datasets.to_json(datasets_filename +'.json', orient='records', date_format='iso', lines=False) print(f"The following {file_type} file was generated successfully: {datasets_filename}") # Fetch all checks response_checks = requests.get( soda_cloud_url + '/api/v1/checks?size=100', auth=(soda_apikey , soda_apikey_secret)) if response_checks.status_code == 200: check_pages = response_checks.json().get('totalPages') i = 0 while i < check_pages: dq_checks = requests.get( soda_cloud_url + '/api/v1/checks?size=100&page='+str(i), auth=(soda_apikey , soda_apikey_secret)) if dq_checks.status_code == 200: print("Fetching all checks on page "+str(i)) check_list = dq_checks.json().get("content") checks.extend(check_list) i += 1 elif dq_checks.status_code == 429: print("API Rate Limit reached when fetching checks on page: " +str(i)+ ". Pausing for 30 seconds.") time.sleep(30) # Retry fetching the same page else: print("Error fetching checks on page "+str(i)+". Status code:", dq_checks.status_code) else: print("Error fetching initial checks. Status code:", response_checks.status_code) sys(exit) df_checks = pd.DataFrame(checks) # Clean up dataframe df_checks.insert(0, 'record_created', current_time) df_checks['dataset_id'] = df_checks['datasets'].apply(lambda x: x[0]['id'] if x else None) df_checks['dataset_name'] = df_checks['datasets'].apply(lambda x: x[0]['name'] if x else None) df_checks['dataset_url'] = df_checks['datasets'].apply(lambda x: x[0]['cloudUrl'] if x else None) df_checks['lastCheckResultValue'] = df_checks['lastCheckResultValue'].apply(lambda x: x.get('value') if isinstance(x, dict) and 'value' in x else (x.get('valueLabel') if isinstance(x, dict) and 'valueLabel' in x else x)) df_checks['attributes'] = df_checks['attributes'].fillna({}) df_checks['check_owner'] = df_checks['owner'].apply(lambda x: x.get('firstName', '') + ' ' + x.get('lastName', '') if isinstance(x, dict) else '') df_checks['owner_email'] = df_checks['owner'].apply(lambda x: x.get('email', '') if isinstance(x, dict) else '') # Rename columns df_checks.rename(columns={'id': 'check_id'}, inplace=True) df_checks.rename(columns={'name': 'check_name'}, inplace=True) df_checks.rename(columns={'evaluationStatus': 'check_status'}, inplace=True) df_checks.rename(columns={'definition': 'check_definition'}, inplace=True) df_checks.rename(columns={'cloudUrl': 'check_url'}, inplace=True) # Add attribute names as separate columns with attribute values for index, row in df_checks.iterrows(): attributes_dict = row['attributes'] for key in attributes_dict: column_name = key column_value = attributes_dict[key] df_checks.at[index, column_name] = column_value # Drop original columns df_checks = df_checks.drop(columns=['attributes']) df_checks = df_checks.drop(columns=['datasets']) df_checks = df_checks.drop(columns=['owner']) # Create checks file if file_type == 'json': df_checks.to_json(checks_filename +'.json', orient='records', date_format='iso', lines=False) if file_type == 'csv': df_checks.to_csv(checks_filename +'.csv', index=False) print(f"The following {file_type} file was generated successfully: {checks_filename}") ``` ## Go further > Need help? Join the [Soda community on Slack](https://community.soda.io/slack). --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://docs.soda.io/soda-documentation/soda-v3/soda-cloud-api-v1/api2csv-example.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.