Release Notes¶
1.1.0¶
Release notes: October 16, 2024
Features and Improvements
- This release adds a new reader and a post processor for OCI Autonomous Data Warehouse (ADW).
ADWApplicationDataReader
: A new reader that reads data from Oracle ADW tables using SQL queries. Provides support for reading large volume of data using static and dynamic partitioning, bind variables,mTLS
authentication, and it overrides the start date and end date bind parameters in SQL queries using runtime parameters.SaveMetricToOracleADWPostProcessor
: A new Post Processor that writes the monitoring results (monitor details, metrics and test results) to a set of tables in an OCI ADW database.
1.0.0¶
Release notes: August 12, 2024
Features and Improvements
- Test/Test SuitesTest/Test Suites feature enables comprehensive validation of customer’s machine learning models and data via a suite of test and test suites for various types.
TEST_CONFIG_LOCATION: Introduces a new runtime parameter TEST_CONFIG_LOCATION. It allows user to configure Insights Test/Test Suite config as a separate configuration json.
Enhanced SaveMetricOutputAsJsonPostProcessor to persist test result.
- Monitoring Notification: The Test or Test Suites component lets users configure Insights tests that check if the metric results produced during Profile computation breach a particular threshold. Any detected threshold breaches are notified to a user in a way that is easy to visualise and track. ML Application provides a Notification feature to push threshold breaches configured using Test or Test Suites to the OCI Monitoring service.
This release introduces a new post processor, OCI Monitoring Application Post Processor (OCIMonitoringApplicationPostProcessor) to push Test Results to OCI Monitoring. This forms the basis for “Monitoring Notifications” feature to allow data scientists to continually monitor data and model health.
- Metrics: This release introduces the following new metrics:
CorrelationRatio : Computes correlation matrix for a set of categorical and numerical features. With this addition, Insights now provides 3 correlation metrics for different combinations of numerical and categorical features, the other 2 being PearsonCorrelation and CramersVCorrelation
DateTimeMin : Computes minimum date time value in a feature. Supports date time string and timestamp values.
DateTimeMax : Computes maximum date time value in a feature. Supports date time string and timestamp values.
DateTimeDuration : Computes longest duration for a date time feature i.e max date - min date. Supports date time string and timestamp values.
0.0.3¶
Release notes: May 15, 2024
Features and Improvements
CONFIG_LOCATION: This is the HTTP location of oci storage application config, which is a mandatory parameter for ODSC Job Run to kick start a run.
- Application Configuration: ML Monitoring Application can be set up and customized by authoring a JSON configuration. The configuration then needs to be saved in an object store location and passed in the CONFIG_FILE variable of RUNTIME_PARAMETER while starting a job run.
Monitor Id: User provided id used to identify a monitor config uniquely.
Storage Details: Details of the type of storage and location for retrieving the baseline profile(in case of a prediction run) and persist the internal state of a run.
Input Schema: Input schema is the map of features and their data types, variable types, and column type.
Reader: Readers (Baseline/Prediction) are responsible for reading data in a specific format. Currently all the Readers mentioned can read from the local file system and OCI Object storage. The Readers supported with the release include CSV Dask Data Reader, JSONL Dask Data Reader and Nested JSON Dask Data Reader.
Data Source: The data source component lets you specify the data source for the data reader to read the data from. It supports attributes like File Type and File Path. The data sources supported are, Local Date Prefix Data Source, Local File Data Source, OCI Date Prefix Data Source, and OCI Object Storage Data Source.
Transformer: The transformer component provides an easy way to do simple in-memory transformations on the input data.
- Metrics: Metric components are responsible for calculating all statistical metrics and algorithms of the data. There are multiple metric types supported in the ML Insights Release. The set of metrics supported include:
Feature Metrics: (Count, Distinct Count, Duplicate Count, Frequency Distribution, Inter Quartile Range, Is Constant Feature, Is Quasi Constant Feature, Kurtosis, Max, Mean, Min, Mode, Probability Distribution, Quartiles, Range, Skewness, Standard Deviation, Sum, Top K Frequent Elements, Type Metric, Variance, IsPositive, IsNegative, IsNonZero, Percentiles)
Model Performance: Metrics (Row Count, Mean Absolute Error, Mean Squared Error, R2 Score, Root Mean Squared Error, Mean Squared Log Error, Mean Absolute Percentage Error, Max Error, Conflict Prediction, Conflict Label)
Data Quality Metrics: (CramersVCorrelation, Pearson Correlation)
Classification Metrics: (Accuracy Score, Precision Score, Recall Score, FBeta Score, Log Loss, False Positive Rate, False Negative Rate, Specificity, Confusion Matrix, ROC Curve, ROC Area Under Curve, Precision RecallCurve, Precision Recall Area Under Curve)
Drift Metrics: (Jensen Shannon, KullbackLeibler, Population Stability Index, Kolmogorov Smirnov, ChiSquare)
Bias and Fairness: (Class Imbalance)
Post-Processor Component: Post processors are responsible for running any action after the entire data is processed and all the metrics calculated. ML Monitoring Application supports : SaveMetricOutputAsJsonPostProcessor(save result of a run in json format).
- RUNTIME_PARAMETER:
- ACTION_TYPE: In order to run the ML Monitoring Application, the user needs to define one Action Type corresponding to each run. Currently, there are 3 Action Types supported in the ML Monitoring Application:
RUN_CONFIG_VALIDATION: A ML Monitoring Application Run can be specifically defined as a Config Validation Run, when one wants to validate the application configuration provided as input to the application.
RUN_BASELINE: A ML Monitoring Application Run can be specifically defined as a Baseline Run, when one wants to calculate the performance drift with respect to other ML Monitoring Application Runs using the ActionType as RUN_BASELINE.
RUN_PREDICTION: A ML Monitoring Application Run can be specifically defined as a Prediction Run, when one wants to calculate the performance drift with respect to other Ml Monitoring Application Runs using the ActionType as RUN_PREDICTION.
DATE_RANGE: The DATE_RANGE parameter will override the start and end filters of the ObjectStorageFileSearchDataSource and SaveMetricOutputAsJsonPostProcessor present in the application configuration.