Release Notes

23.2.1

Features and Improvements

  • Added install options automlx[forecasting], automlx[onnx], and automlx[deep-learning] alongside automlx[viz]. Install options create minimal sized wheels for the associated task. You can overload install options if combined functionality is desired. e.g., automlx[forecasting,viz].

Bug fixes

  • Fixed bug where ETSForecaster could fail the entire pipeline when it fails to convergence.

  • Fixed bug which causes pipeline to set forecast horizon to zero when forecasting short length time series (less than 8 datapoints).

  • Fixed bug which could cause model fit failure for some Seasonal Decompose (e.g., STL) for series which have short length (less than 3 times seasonality period).

  • Fixed bug where BoxCox transformer could produce NaNs as the result of inverse transformation.

  • Fixed a bug that caused the advanced feature importance sampling strategies to raise an exception.

Possibly breaking changes

  • Deep-learning models for classification (TorchMLPClassifier, CatboostClassifier, TabNetClassifier), regression (TorchMLPRegressor) and anomaly detection (AutoEncoderOD) now require install option automlx[deep-learning].

  • Changed the initialization of the logging module to:

    • no longer log to file by default;

    • not overwrite the global logging configuration if it was already setup.

23.2.0

Features and Improvements

  • Added support for TabNet classifier.

    • Training TabNet with CPUs is slow, so it is disabled by default until GPU support is added.

    • To enable TabNet, add ‘TabNetClassifier’ to the model_list when initializing the AutoML Pipeline.

  • New counterfactual Explainer (ACE)

    • Added the AutoMLx Counterfactual Explainer (ACE) for classification and anomaly detection tasks.

    • ACE is faster and finds more valid counterfactuals than DiCE.

    • It guarantees to find a counterfactual for each query instance if the reference dataset set contains an example with the desired class.

  • Fairness Feature Importance is now available for tabular datasets! MLExplainer has a new explain_model_fairness() function to compute global feature importance attributions for fairness metrics.

  • Added threshold tuning for binary and multi-class classification tasks. Threshold Tuning can be enabled by passing threshold_tuning=True to the Pipeline object when it is created.

  • Python 3.10 support added.

Deprecations

  • Removed support for Uber Orbit forecaster due to in-built bayesian inference engine instability.

  • Added deprecation warnings to objects that will be removed or replaced in 23.3.0.

    • Deprecations include:

      • Internal (never-documented) attributes of the AutoML pipeline.

      • The dask and spark execution engines and related options.

      • The ModelTune interface.

      • All Pipeline attributes matching *_trials_ , which contain information about the trials performed by the AutoML pipeline. These will be replaced by two new dataframe attributes completed_trials_summary_ and completed_trials_detailed_ ,.

      • AutoML optimization levels 1 and 2.

      • The Pipeline attribute selected_features_ . Instead, users should use selected_features_names_ or selected_features_names_raw_ to access the names of the selected engineered or raw features, respectively.

    • Deprecation warnings can be suppressed using from automl import init; init(check_deprecation_warnings=False)

Miscellaneous

  • Bump packages

    • fbprophet==0.7.1 to prophet==1.1.2

    • torch to 1.13.1

    • onnx to 1.12.0

    • onnxruntime to 1.12.1

Possibly breaking changes

  • score_metric is no longer accepted in the MLExplainer factory function. It is now an optional argument to the TabularExplainer ’s explain_model and explain_model_fairness methods.

23.1.1

Features and Improvements

  • Unsupervised anomaly detection

    • Implemented N-1 experts for hyperparameter tuning

    • Added N-1 experts-based contamination factor identification

  • Overhauled package documentation

Bug fixes

  • Fixed a bug in feature importance explainers for when the dataset contains feature names that are numpy integers and an AutoML pipeline is being explained.

23.1.0

Features and Improvements

  • Fairness metrics are now available to measure bias in both datasets and trained models. Fairness metrics can be imported from automl.fairness.metrics .

  • Explanations can now be computed from custom user-defined metrics.

  • Introduced max_tuning_trials option that controls maximum HPO trials per algorithm.

  • New explainer (Counterfactual)

    • Added a model-agnostic counterfactual explainer for classification, regression, and anomaly detection tasks.

    • The explainer can find diverse counterfactuals for the desired prediction, while the user is able to choose which features to vary and their permitted range.

    • Counterfactual explanations can be visualized either with What-if explainer or dataframe.

  • Added support of surrogate explainer for local text explanation.

  • Code updated to comply with security checks with Python Bandit.

  • Added catboost as a new classification model.

Bug fixes

  • Fixed a bug on LIME’s explanation Bar Chart where annotations were misplaced for dataset stringified integers feature names.

  • Fixed a bug where features would be placed incorrectly on plots’ axis when trying to visualize explanations for categorical features.

  • Deleted internal state to reduce memory consumption in explanations

  • Fixed a bug where dataset downcasting to int32 and float32 was only applied during training but not for doing the final fit or collecting predictions.

  • Preprocessing of datetime columns is now much faster.

  • Fixed a bug where dependencies of automl would on import initialize a rootLogger preventing subsequent applications from using logging.basicConfig() .

  • Fixed a bug where the AutoTune step would override default params even if it did not find any better params than the default ones.

  • Propogated dataset downcasting to all relevant pipeline stages, potentially reducing memory consumption for very large datasets.

  • Changed AutoTune behavior to consider using default hyper-parameters scored at the end of feature selection step if they performed better than those AutoTune tried within timebudget. .

Deprecations

  • Added deprecation warnings for the following:

    • Some attributes in the pipeline that are not publicly documented.

    • Attributes of the pipeline containing trial information, which were renamed to completed_trials_summary_ and completed_trials_detailed_ . The stage column is renamed to step .

    • Optimization levels of 1 and 2.

    • Dask and spark engines and engine options.

    • The ModelTune class.

  • To disable the warnings:

    • In the initialization, set the argument check_deprecation_warnings to False.

22.4.2

Features and Improvements

  • Added support for explaining selected features in local and global permutation importance, as well as automatically detecting which features were selected by an AutoML model.

Bug fixes

  • Fixed a bug in local perturbation-based feature attribution explainers for the n_iter='auto' option that caused the iterations to be set too high.

  • Enhanced performance of local feature importance explainers to improve running times by batching inference calls together.

22.4.1

Features and Improvements

  • Pipeline now accepts a min_class_instances input argument to manually specify the number of examples every class must have when doing classification. The value for min_class_instances must be at least 2.

Bug fixes

  • Fixed a bug where IPython and ipywidgets are not properly guarded as an optional dependencies which makie them required.

  • Fixed a bug introduced by last dependency update which caused fbprophet to not produce forecasts with correct index type, when fbprophet was installed manually.

22.4.0

Features and Improvements

  • New feature dependence explainers

    • Added an Accumulated Local Effects (ALE) explainer

    • ALE explanations can be computed for up to two features if at least one is not categorical.

  • New explainer (What-IF)

    • Added a What-IF explainer for classification and regression tasks

    • What-IF explanations include exploration of the behavior of an ML model on a single sample as well as on the entire dataset.

    • Sample exploration (edit a sample value and see how the model predictions changes) and relationships’ visualization (how a feature is related to predictions or other features) are supported.

  • New feature importance aggregators

    • Added ALFI (Aggregate Local Feature Importance) that gives a visual summary of multiple local explanations.

  • New local feature importance explainer

    • Added support for surrogate-based (LIME+) local feature importance explainers

Bug fixes

  • Import failure due to CUDA: The package no longer crashes when imported on a machine with CUDA installed.

  • Fixed a bug where TorchMLPClassifier would fail when trying to predict a single instance.

  • Fixed a bug where OracleAutoMLx_Forecasting.ipynb would fail if visualization packages were not already installed.

  • Fixed a bug that caused the pipeline.transform to raise an exception if a single row was passed.

  • Explanation documentation

    • Our documentation website (http://automl.oraclecorp.com/) now includes documentation for the explanation objects returned by our explainers.

  • Enhanced performance of local feature importance explainers to address long running times.

  • Improved visualization of facet for the columns with cardinality equal to 1 by selecting the bars’ width and pads properly.

22.3.0

Features and Improvements

  • New Explainer

    • Added support for KernelSHAP (a new feature importance tabulator), which provides fast approximations for the Shapley feature importance method.

  • Support ARM architecture ( aarch64 )

    • Released platform-specific wheel file for ARM machines.

Miscellaneous

  • Clarified documentation on the accepted data formats for input datasets and added a more meaningful corresponding error message.

22.2.0

Features and Improvements

  • New profiler

    • Profiler tracks CPU and memory utilization

  • Timeseries forecasting pipeline

    • Added the support for multivariate datasets

    • Added the support for exogenous variables

    • Enhanced heteroskedasticity detection technique

    • Applied Box-Cox transform-inverse_transform with params determined via MLE to handle heteroskedasticity

  • Explainers / MLX integration

    • New global text explainer

      • Added support

    • New feature importance attribution explainers

      • Added several local and global feature importance explainers, including permutation importance, exactly Shapley, and SHAP-PI.

      • The explainers support for classification, regression and anomaly detection

      • The explainers can also be configured to explain the importance of features to any model (explanation_type=’observational’) as well as for a particular model (explanation_type=’interventional’).

      • Observational explanations are supported for all tasks; interventional explanations are only supported for classification and regression.

    • New feature dependence explainers

      • Added a partial dependence plot (PDP) and individual conditional expectations (ICE) explainer

      • PDP explanations include vizualization support for up to 4 dimensions. PDPs in higher dimension can be returned as dataframes.

  • Unsupervised Anomaly Detection

    • Added N-1 Experts: a new experimental metric for UAD Model Selection

  • Documentation

    • Added the description of init function of the automl to documentation

    • Cleaned up documentation for more consistency among different sections and added cross-references

Bug fixes

  • Timeseries forecasting pipeline

    • Statsmodel exception for some frequencies, users are now able to pass in timeperiod as a parameter

  • Preprocessing

    • Datetime preprocessor

      • Fixed the bug regarding column expansion and None/Null/Nan values

    • Standard preprocessor refitting

      • The standard preprocessor used to first be fit on a subsample of the training set, and then re-fit at the very end of the pipeline using the full training set. This occasionaly resulted in a different number of engineered features being produced. As a result, the features identified during the model selection module could no longer exist. The standard preprocessor is now fit only once.

  • ONNX predictions inconsistency

    • Changed the ONNX conversion function to reduce the difference between the ONNX dumped model and the original pipeline object predictions

    • Improved ONNX conversion runtime

    • ONNX conversion now only requires a sample from the training or test set as input. This sample is used to infer the final types and shapes

Possibly breaking changes

  • Removed matplotlib as a dependency of the AutoMLx package

    • Forecasting predictions can now instead be visualized only using plotly using the same interface as before, automl.utils.plot_forecast. The alternate visualizations that were provided with plotly using automl.utils.plot_forecast_interactive has been removed.

  • Updated the AutoMLx package dependencies

    • All dependency versions have been reviewed and updated to address all known CVEs

    • A few unneeded dependencies have also been removed.