Write checks with SodaGPT
Last modified on 27-Sep-23
SodaGPT is a generative AI assistant for data quality testing.
Log in to your Soda Cloud account, click the Ask SodaGPT button in the main nav, then provide natural language instructions to the interface to receive fully-formed, syntax-correct checks in the Soda Checks Language (SodaCL). If you do not already have an account, sign up for Soda Cloud for a 45-day free trial.
Use the generated checks to test data quality in your data pipeline, in your development workflow, and in your Soda Agreements to prevent data quality issues from causing downstream impact.
Instruction parameters
Log in to your Soda Cloud account, click the Ask SodaGPT button in the main nav, then provide natural language instructions to the interface to receive fully-formed, syntax-correct checks in the Soda Checks Language (SodaCL).
- Provide instructions in English.
- SodaGPT is capable of writing one data quality check at a time.
- SodaGPT only outputs SodaCL.
- Provide the following information in your instruction:
- the name of your dataset
- the name of at least one column in that dataset
- SodaGPT is capable of writing the following types of SodaCL checks:
- SodaGPT does not retain a history of interactions, so it cannot reference a previously-asked question or response.
About the AI
SodaGPT uses Soda technology to translate natural language requirements into SodaCL checks. It is not related to GPT3, CPT4, chatGPT or OpenAI.
For SodaGPT’s functionality, Soda trained a very specialized Large Language Model (LLM) based on the open-source Falcon-7b model. The model currently does not learn from user input, and will never learn sensitive information from one user and expose it to another.
SodaGPT only accepts the instructions you input in the chat; it does not collect or store any other data. Soda does not send the input to third parties.
Go further
- Need help? Join the Soda community on Slack.
- Get started with Soda by following a tutorial.
- Consider using check suggestions to profile your data and suggest basic checks for data quality.
Was this documentation helpful?
What could we do to improve this page?
- Suggest a docs change in GitHub.
- Share feedback in the Soda community on Slack.
Documentation always applies to the latest version of Soda products
Last modified on 27-Sep-23