Anomaly Detection now includes Univariate Anomaly Detection, Multivariate Anomaly Detection improvements, and Asynchronous Detection

We've added support for detecting anomalies in univariate signals that allow you to detect different types of anomalies in univariate signals: point, collective, and contextual anomalies.

Other improvements include:

  • Performance improvements for both training and evaluation.
    • Now using NumExpr (Fast numerical expression evaluator for NumPy) other than Numpy for algebraic and transcendental function evaluations.
    • Intel Math Kernel Library (MKL) support to accelerate function evaluation
  • Improved performance of sklearn pair wise distance metric calculation for improved detection.
    • Implemented step-wise matrix multiplication to replace loops used in sklearn package.
  • Implement efficient memory handling for large batch size (up to 10K).
    • Implemented batch-based column processing to calculate the pair-wise distance to avoid memory issues because of storing large matrix.
  • Preprocessing Improvements.
    • IQR [Inter quartile range] based outlier detection and removal.
    • Trend and seasonality decomposition: Seasonal Trend Decomposition using Loess (STL) or Linear Detrending.
  • Kernel improvements for OCSVM using automatic hyperparameter tuning:  Dynamic window size selection using Periodicity detection (Autocorrelation Function and Heuristic Based Frequency Detector).
  • Postprocessing Improvements (during Detection): We prune excess anomalies by suppressing anomalies that appear consecutively in groups larger than the window size (to avoid excessive flagging beyond window size data points).
  • User-specified tuning: Added sensitivity parameter in detection allows you to adjust the number of anomalies flagged by selecting the appropriate threshold, without having to retrain.

Added the Asynchronous Detection API on large to very large datasets (~ 100 million – billions of data points). This API:

  • Extends the existing Anomaly Detection service capabilities :
    • Supports large datasets ( from 30K data points to 100 million+ data points).
    • Supports Training Data with up to 1000+ signals* (*available on request).
    • High model accuracy by enabling model training with better model characteristics (window size, memory vectors).
  • Allows input in the form of inline, or a list of objects in object store. The different modes of input provide you flexibility prior to onboarding to the service.

Other asynchronous improvements include:

  • Encryption of intermediate data.
  • Load balancing - IPVS for network routing within K8s cluster for load balancing.
  • Parallel request handling from database queries:
    • Optimistic locking for database queries to handle parallel requests.
  • Auto Scaling:
    • Horizontal pod autoscaler.
    • Cluster autoscale.

This release also introduces MSET2 Multivariate anomaly detection kernel to support large batch size calls (using asynchronous detection), which improves detection accuracy for prognostics use cases.

This helps surveillance-based anomaly detection uses-cases to detect anomalies in the context of a complete historical state

  • Call the service using explicit option for multivariate mset using asynchronous detection
  • The service takes care of computing the state based on Cumulative Sums using appropriately large batch size internally
  • This offers improved performance by retaining the historical context resulting in a lower missed alarm rate when compared synchronous detection.

The Anomaly Detection service documentation shows you how to use this service. You can find interesting information in the AI Blog.