AI Forecast Operator

The AI Forecast Operator uses historical time series data to generate forecasts for future trends.

This operator simplifies and quickens the data science process by automating model selection, hyperparameter tuning, and feature identification for a specific prediction task.

The Operator is easy to use and extend, and as powerful as a team of data scientists. To get started with the a forecast, use the following YAML configuration:
kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: ds
    historical_data:
        url: https://raw.githubusercontent.com/facebook/prophet/main/examples/example_yosemite_temps.csv
    horizon: 3
    target_column: y

This example is extended in various ways throughout this documentation. However, all parameters beyond those shown are optional.

For more information, see the Forecasting section of the ADS documentation.

Modeling Options

No perfect model exists. A core feature of the Operator is the ability to select from various model frameworks. For enterprise AI, typically one or two frameworks perform best for the problem space. Each model is optimized for different assumptions, such as dataset size, frequency, complexity, and seasonality. The best way to decide which framework is correct for you is through empirical testing. Based on experience with several enterprise forecasting problems, the ADS team has found the following frameworks to be the most effective, ranging from traditional statistical models to complex machine learning and deep neural networks:
  • Prophet
  • ARIMA
  • LightGBM
  • NeuralProphet
  • AutoTS
Note

AutoTS isn't a single modeling framework but a combination of many. AutoTS algorithms include (v0.6.15): ConstantNaive, LastValueNaive, AverageValueNaive, GLS, GLM, ETS, ARIMA, FBProphet, RollingRegression, GluonTS, SeasonalNaive, UnobservedComponents, VECM, DynamicFactor, MotifSimulation, WindowRegression, VAR, DatepartRegression, UnivariateRegression, UnivariateMotif, MultivariateMotif, NVAR, MultivariateRegression, SectionalMotif, Theta, ARDL, NeuralProphet, DynamicFactorMQ, PytorchForecasting, ARCH, RRVAR, MAR, TMF, LATC, KalmanStateSpace, MetricMotif, Cassandra, SeasonalityMotif, MLEnsemble, PreprocessingRegression, FFT, BallTreeMultivariateMotif, TiDE, NeuralForecast, DMD.

Auto-Select

For users new to forecasting, the Operator also has an auto-select option. This is the most computationally expensive option as it splits the training data into several validation sets, evaluates each framework, and tries to decide the best one. However, auto-select doesn't guarantee to find the best model and isn't recommended as the default configuration for end-users because of its complexity.

Specify the Model

You can manually select the required model from the list in Modeling Options and insert it into the model parameter slot. For example:
kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: ds
    historical_data:
        url: https://raw.githubusercontent.com/facebook/prophet/main/examples/example_yosemite_temps.csv
    horizon: 3
    model: <INSERT_MODEL_NAME_HERE>
    target_column: y

Evaluation and Explanation

As an enterprise AI solution, the Operator ensures that the evaluation and explanation of forecasts are as critical as the forecasts themselves.

Reporting

With every operator run, a report is generated to summarize the work done. The report includes:
  • A summary of the input data.
  • A visualization of the forecast.
  • A listing of major trends.
  • An explanation (using SHAP values) of extra features.
  • A table of metrics.
  • A copy of the configuration YAML file.

Metrics

Different use cases optimize for different metrics. The Operator lets users specify the metric they want to optimize from the following list:
  • MAPE
  • RMSE
  • SMAPE
  • MSE
Optionally, the metric can be specified in the YAML file:
kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: ds
    historical_data:
        url: https://raw.githubusercontent.com/facebook/prophet/main/examples/example_yosemite_temps.csv
    horizon: 3
    model: prophet
    target_column: y
    metric: rmse

Explanations

When extra data is provided, the Operator can optionally generate explanations for these features (columns) using SHAP values. You can enable explanations in the YAML file:
kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: ds
    historical_data:
        url: https://raw.githubusercontent.com/facebook/prophet/main/examples/example_pedestrians_covid.csv
    additional_data:
        url: additional_data.csv
    horizon: 3
    model: prophet
    target_column: y
    generate_explanations: True
``` formatted YAML ```
With large datasets, SHAP values can be expensive to generate. Enterprise applications might vary in their need for decimal accuracy compared to computational cost. Therefore, the Operator offers several options:
FAST_APPROXIMATE (default)
Generated SHAP values are typically within 1% of the true values and require 1% of the time.
BALANCED
Generated SHAP values are typically within 0.1% of the true values and require 10% of the time.
HIGH_ACCURACY
Generates the true SHAP values at full precision.
kind: operator
type: forecast
version: v1
spec:
    datetime_column:
        name: ds
    historical_data:
        url: https://raw.githubusercontent.com/facebook/prophet/main/examples/example_yosemite_temps.csv
    horizon: 3
    model: prophet
    target_column: y
    generate_explanations: True
    explanations_accuracy_mode: BALANCED
Selecting the best accuracy mode requires empirical testing, but FAST_APPROXIMATE is most often enough for real-world data.
Note

The previous example doesn't generate explanations because of no extra data. The SHAP values are 100% for the feature y.