Link Search Menu Expand Document

Anomaly score checks

Use an anomaly score check to automatically discover anomalies in your time-series data.
Requires Soda Cloud and Soda Core Scientific.

checks for dim_customer:
  - anomaly score for row_count < default

About anomaly score checks
Prerequisites
Install Soda Core Scientific
Define an anomaly score check
Anomaly score check results
Optional check configurations
List of comparison symbols and phrases
Troubleshoot Soda Core Scientific installation
Go further

About anomaly score checks

The anomaly score check is powered by a machine learning algorithm that works with measured values for a metric that occur over time. The algorithm learns the patterns of your data – its trends and seasonality – to identify and flag anomalies in time-series data.

If you have connected Soda Core to a Soda Cloud account, Soda Core pushes check results to your cloud account where Soda Cloud stores all the previously-measured, historic values for your checks in the Cloud Metric Store. SodaCL can then use these stored values to establish a baseline of normal metric values against which to evaluate future metric values to identify anomalies. Therefore, you must have a created and connected a Soda Cloud account to use anomaly score checks.

Prerequisites

Install Soda Core Scientific

To use an anomaly score check, you must install Soda Core Scientific in the same directory or virtual environment in which you installed Soda Core. Best practice recommends installing Soda Core and Soda Core Scientific in a virtual environment to avoid library conflicts, but you can Install Soda Core Scientific locally if you prefer.

  1. Set up a virtual environment, as described in the Soda Core install documentation.
  2. Install Soda Core in your new virtual environment.
  3. Use the following command to install Soda Core Scientific.
pip install soda-core-scientific

Note that installing the Soda Core Scientific package also installs several scientific dependencies. Reference the soda-core-scientific setup file in the public GitHub repository for details.

Refer to Troubleshoot Soda Core Scientific installation for help with issues during installation.

Define an anomaly score check

The following example demonstrates how to use the anomaly score for the row_count metric in a check. You can use any numeric, missing, or validity metric in lieu of row_count.

checks for dim_customer:
  - anomaly score for row_count < default
  • Currently, you can only use < default to define the threshold in an anomaly score check.
  • By default, anomaly score checks yield warn check results, not fails.


You can use any numeric, missing, or validity metric in anomaly score checks. The following example detects anomalies for the average of order_price in an orders dataset.

checks for orders:
  - anomaly score for avg(order_price) < default

The following example detects anomalies for the count of missing values in the id column.

checks for orders:
  - anomaly score for missing_count(id) < default:
    missing_values: [None, No Value]

Anomaly score check results

Because the anomaly score check requires at least four data points before it can start detecting what counts as an anomalous measurement, your first few scans will yield a check result that indicates that Soda does not have enough data.

Soda Core 3.0.0xx
Anomaly Detection Frequency Warning: Coerced into daily dataset with last daily time point kept
Data frame must have at least 4 measurements
Skipping anomaly metric check eval because there is not enough historic data yet
Scan summary:
1/1 check NOT EVALUATED: 
    dim_customer in adventureworks
      anomaly score for missing_count(last_name) < default [NOT EVALUATED]
        check_value: None
1 checks not evaluated.
Apart from the checks that have not been evaluated, no failures, no warnings and no errors.
Sending results to Soda Cloud

Though your first instinct may be to run several scans in a row to product the four measurments that the anomaly score needs, the measurements don’t “count” if the frequency of occurrence is too random, or rather, the measurements don’t represent enough of a stable frequency.

If, for example, you attempt to run eight back-to-back scans in five minutes, the anomaly score does not register the measurements resulting from those scans as a reliable pattern against which to evaluate an anomaly.

Consider using the Soda Core Python library to set up a programmatic scan that produces a check result for an anomaly score check on a regular schedule.

Optional check configurations

Configuration Documentation
  Define a name for an anomaly score check. -
  Define alert configurations to specify warn and fail thresholds. -
  Apply a filter to return results for a specific portion of the data in your dataset. -
Use quotes when identifying dataset names; see example Use quotes in a check
  Use wildcard characters ( % or * ) in values in the check. -
Use for each to apply anomaly score checks to multiple datasets in one scan; see example. Apply checks to multiple datasets
  Apply a dataset filter to partition data during a scan; see example. Scan a portion of your dataset

Example with quotes

checks for "dim_customer":
  - anomaly score for row_count < default

Example with for each

for each dataset T:
  datasets:
    - dim_customer
  checks:
    - anomaly score for row_count < default


List of comparison symbols and phrases

<

Troubleshoot Soda Core Scientific installation

While installing Soda Core Scientific works on Linux, you may encounter issues if you install Soda Core Scientific on Mac OS (particularly, machines with the M1 ARM-based processor) or any other operating system. If that is the case, consider using one of the following alternative installation procedures.

Need help? Ask the team in the Soda community on Slack.

Use Docker to run Soda Core

Use Soda’s Docker image in which Soda Core Scientific is pre-installed.

  1. If you have not already done so, install Docker in your local environment.
  2. From Terminal, run the following command to pull the latest Soda Core’s official Docker image.
    docker pull sodadata/soda-core
    
  3. Verify the pull by running the following command.
    docker run sodadata/soda-core --help
    

    Output:

     Usage: soda [OPTIONS] COMMAND [ARGS]...
    
     Soda Core CLI version 3.0.0bxx
    
     Options:
     --help  Show this message and exit.
    
     Commands:
     scan    runs a scan
     update-dro  updates a distribution reference file
    

    When you run the Docker image on a non-Linux/amd64 platform, you may see the following warning from Docker, which you can ignore.

    WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
    
  4. When you are ready to run a Soda scan, use the following command to run the scan via the docker image. Replace the placeholder values with your own file paths and names.
    docker run -v /path/to/your_soda_directory:/sodacl sodadata/soda-core scan -d your_data_source -c /sodacl/your_configuration.yml /sodacl/your_checks.yml
    

    Optionally, you can specify the version of Soda Core to use to execute the scan. This may be useful when you do not wish to use the latest released version of Soda Core to run your scans. The example scan command below specifies Soda Core version 3.0.0.

    docker run -v /path/to/your_soda_directory:/sodacl sodadata/soda-core:v3.0.0 scan -d your_data_source -c /sodacl/your_configuration.yml /sodacl/your_checks.yml
    
What does the scan command do?
  • docker run ensures that the docker engine runs a specific image.
  • -v mounts your SodaCL files into the container. In other words, it makes the configuration.yml and checks.yml files in your local environment available to the docker container. The command example maps your local directory to /sodacl inside of the docker container.
  • sodadata/soda-core refers to the image that docker run must use.
  • scan instructs Soda Core to execute a scan of your data.
  • -d indicates the name of the data source to scan.
  • -c specifies the filepath and name of the configuration YAML file.


Error: Mounts denied

If you encounter the following error, follow the procedure below.

docker: Error response from daemon: Mounts denied: 
The path /soda-core-test/files is not shared from the host and is not known to Docker.
You can configure shared paths from Docker -> Preferences... -> Resources -> File Sharing.
See https://docs.docker.com/desktop/mac for more info.

You need to give Docker permission to acccess your configuration.yml and checks.yml files in your environment. To do so:

  1. Access your Docker Dashboard, then select Preferences (gear symbol).
  2. Select Resources, then follow the Docker instructions to add your Soda project directory – the one you use to store your configuration.yml and checks.yml files – to the list of directories that can be bind-mounted into Docker containers.
  3. Click Apply & Restart, then repeat steps 2 - 4 above.


Error: Configuration path does not exist

If you encounter the following error, double check the syntax of the scan command in step 4 above.

  • Be sure to prepend /sodacl/ to both the congifuration.yml filepath and the checks.yml filepath.
  • Be sure to mount your files into the container by including the -v option. For example, -v /Users/MyName/soda_core_project:/sodacl.
Soda Core 3.0.0bxx
Configuration path 'configuration.yml' does not exist
Path "checks.yml" does not exist
Scan summary:
No checks found, 0 checks evaluated.
2 errors.
Oops! 2 errors. 0 failures. 0 warnings. 0 pass.
ERRORS:
Configuration path 'configuration.yml' does not exist
Path "checks.yml" does not exist


Install Soda Core Scientific Locally

The following works on Mac OS on a machine with the M1 ARM-based processor. Consult the sections below to troubleshoot errors that may arise.

From your command-line interface, use the following command to install Soda Core Scientific.

pip install soda-core-scientific

Error: No module named ‘wheel’

If you encounter the following error, follow the procedure below.

Collecting lightgbm>=2.2.3
  Using cached lightgbm-3.3.2.tar.gz (1.5 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/private/var/folders/vj/7nxglgz93mv6cv472sl0pnm40000gq/T/pip-install-j0txphmm/lightgbm_327e689fd1a645dfa052e5669c31918c/setup.py", line 17, in <module>
          from wheel.bdist_wheel import bdist_wheel
      ModuleNotFoundError: No module named 'wheel'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
  1. Install wheel.
    pip install wheel
    
  2. Run the command to install Soda Core Scientific, again.
    pip install soda-core-scientific 
    


Error: RuntimeError: Count not find a ‘llvm-config’ binary

If you encounter the following error, follow the procedure below.

      RuntimeError: Could not find a `llvm-config` binary. There are a number of reasons this could occur, please see: https://llvmlite.readthedocs.io/en/latest/admin-guide/install.html#using-pip for help.
      error: command '/Users/yourname/Projects/testing/venv/bin/python3' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llvmlite
  1. To install llvmlite, you must have a llvm-config binary file that the llvmlite installation process uses. In Terminal, use homebrew to run the following command.
    brew install llvm@11
    
  2. Homebrew installs this file in /opt/homebrew/opt/llvm@11/bin/llvm-config. To ensure that the llvmlite installation process uses this binary file, run the following command.
    export LLVM_CONFIG=/opt/homebrew/opt/llvm@11/bin/llvm-config
    
  3. Run the command to install Soda Core Scientific, again.
    pip install soda-core-scientific 
    

Go further


Last modified on 01-Jul-22

Was this documentation helpful?
Share feedback in the Soda community on Slack.

Help improve our docs!

  • Request a docs change.
  • Edit this page in our GitHub repo.