Anomaly Detection Algorithms

Anomaly Detection uses machine learning (ML) algorithms to learn the patterns and detect anomalies from a dataset.

Univariate algorithms are those that work with only one signal or sensor. Typically, these algorithms build one model per signal that are used to identify anomalies in the sensor or signal. Use the Anomaly Detection service to train a single model for multiple signals within a dataset by managing the mapping of sensor or signal to model internally.

By default, model training happens using univariate algorithms. However, you can override this behavior using the Anomaly Detection API.

Univariate Algorithm

Anomaly Detection helps you to identify anomalies in a univariate dataset.

The training and testing data can only contain timestamps and other numeric attributes that typically represent sensor or signal readings.

graph showing blood sugar levels over a period of time including the timestamps
  • Types of univariate time series patterns that Anomaly Detection can identify accurately:

    • Seasonal Patterns

    • Flat trend

    • Continuously increasing and decreasing linear trend datasets

  • Types of anomalies that Anomaly Detection can identify accurately:

    • Point anomalies

    • Spike

The univariate algorithm builds one model per signal, and is one of the best classic ML algorithms. Signals considered as low correlations by MSET2 are automatically treated as univariate using this algorithm.

The univariate algorithm isn’t standalone and uses the existing multivariate-based API with the same data input format. The univariate model for each univariate signal is built, optimized, and saved independently. In addition, the models are used for inferencing separately.

Capability

It detects anomalies in a signal by considering its time-series patterns, and works on pointy or contextual anomalies.

Requirements
  • The detection dataset can have anomalous data points.

  • The training and inferencing dataset containing numerical values only. Categorical or nominal values aren’t supported.
  • The algorithm uses a window-based feature engineering approach. It requires an extra one window size of data before the actual training or detecting data to learn the patterns or detection anomalies. The minimum total number of timestamps is 80.
  • All of the different normal business scenarios are included in the training dataset. For example, at least one business cycle in the training portion.
Use Cases

Univariate anomaly detection use cases are found across industries. Univariate signals aren’t correlated with other signals and have to be monitored individually.

Restrictions
  • The algorithm only treats one signal at a time so collective anomalies among multiple signals aren't addressed.
  • The univariate algorithm isn’t standalone and uses the existing multivariate-based API with the same data input format.

Multivariate Algorithm

The mulitvariate algorithm helps you to identify anomalies in a multivariate dataset.

Anomaly Detection automatically analyzes the dataset to build multivariate machine learning models or signals by considering their correlations among them. Anomaly Detection helps you monitor complex systems with large number of signals.

A graph of sensors showing the early warning MSET-2 provides in anomaly detection.

The Anomaly Detection service uses MSET2 as the main kernel to detect multivariate time-series anomalies from datasets. MSET2 stand for three techniques:

  • Multivariate State Estimation Technique (MSET)

  • Sequential Probability Ratio Test (SPRT)

  • Intelligent Data Processing (IDP)

All of these techniques were invented by Oracle Labs. The MSET2 algorithm is successfully used in several industries for prognosis analysis.

Capability

It works to detect pointy, contextual, and collective anomalies in multivariate datasets with highly correlated numerical signals. It can handle dataset with a moderate level of missing values, and provides estimated values.

Requirements
  • The training and inferencing dataset can contain numerical values only. Categorical or nominal values aren't supported.
  • The correlations between signals are relatively high. For example, the average pair-wise Pearson correlation between one signal to the rest of signals is no less than 0.1. The kernel excludes signals with lower correlations and treats them with univariate modeling.
  • The training dataset must be anomaly free. For example, the dataset contains normal business scenarios and data values without rare anomaly events.
  • All of the different normal business scenarios are included in the training dataset. For example, at least one business cycle in the training portion. Missing some normal business patterns might lead to false positives during inferencing.
Use Cases

Typical MSET2 use cases are in the manufacturing, IoT, transportation, oil and gas, energy industries because the data is from a signal system or asset with well correlated signals.

Restrictions

Use cases with datasets that aren't numerical, highly correlated, or aren't time series based shouldn't use MSET2 to detect anomalies.