Release Notes

1.0.4

Release notes: January 15, 2024

Features and Improvements

  • Builder: Builder Object provides a core set of APIs, with which users can set the behavior of their monitoring. For example, which reader to use, what metrics to calculate, and which post-processor to use.

  • Config Reader: The config reader component lets the user build the monitoring behavior using a config file provided by them. It creates a builder object using the config file so the user doesn’t need to create the builder object manually.

  • Runner: The runner object runs the internal workflow. It handles the life cycle of each component passed, which includes creation (if required), invoking interface functions, and destroying them.

  • Data Reader: Data Reader is responsible for reading data in a specific format. Currently all the Readers (listed below) can read from the local file system and OCI Object storage. The Readers supported with the release include, CSV Native Data Reader, JSON Native Data Reader, Nested JSON Native Data Reader, CSV Dask Data Reader, JSONL Dask Data Reader, and Nested JSON Dask Data Reader.

  • Data Source: The data source component lets you specify the data source for the data reader to read the data from. It supports attributes like File Type and File Path. The data sources supported are, Local Date Prefix Data Source, Local File Data Source, OCI Date Prefix Data Source, and OCI Object Storage Data Source.

  • Transformer: The transformer component provides an easy way to do simple in-memory transformations on the input data. The Conditional Feature Transformer is supported in this release.

  • Metrics: Metric components are responsible for calculating all statistical metrics and algorithms of the data. There are multiple metric types supported in the ML Insights Release. The set of metrics supported include:
    • Feature Metrics: (Count, Distinct Count, Duplicate Count, Frequency Distribution, Inter Quartile Range, Is Constant Feature, Is Quasi Constant Feature, Kurtosis, Max, Mean, Min, Mode, Probability Distribution, Quartiles, Range, Skewness, Standard Deviation, Sum, Top K Frequent Elements, Type Metric, Variance)

    • Model Performance: Metrics (Row Count, Mean Absolute Error, Mean Squared Error, R2 Score, Root Mean Squared Error, Mean Squared Log Error, Mean Absolute Percentage Error, Max Error, Conflict Prediction, Conflict Label)

    • Data Quality Metrics: (CramersVCorrelation, Pearson Correlation)

    • Classification Metrics: (Accuracy Score, Precision Score, Recall Score, FBeta Score, Log Loss, False Positive Rate, False Negative Rate, Specificity, Confusion Matrix, ROC Curve, ROC Area Under Curve, Precision RecallCurve, Precision Recall Area Under Curve)

    • Drift Metrics: (Jensen Shannon, KullbackLeibler, Population Stability Index, Kolmogorov Smirnov, ChiSquare)

  • Post-Processor Component: Post processor components are responsible for running any action after the entire data is processed and all the metrics calculated. The output metrics are collectively referred to as a profile, and ML Insights supports a set of default Post-Processors to save the profile: Local Writer Post Processor and Object Storage Writer Post Processor.

  • Customization: The SDK allows you to customize the SDK runs to your needs for ML monitoring. You can write a config file defining the Data Reader to be used, the data location to be used, the etrics to be evaluated, and the post processor to be used.

  • Built for Scale: ML Insights library can scale for datasets of any size. The library is built in a way that it reads data in partitions, computes metrics on the partition, and merges the partition metrics at the end. So the library doesn’t load all the data in memory to calculate metrics.

  • Compute technology choice: ML Insights supports the ability to use Pandas(Native), Dask, and Spark based compute technology for metric evaluation. You can choose compute of your choice based on the scale of data and the speed of metric evaluation needed.

  • Extensibility: The SDK provides multiple interfaces for you to extend the SDK to add custom component of your choice to extend the data reading, metric evaluation, or data writing you need to perform. For example, you can write a data reader containing authentication logic to read data from an object storage location that requires the client to authenticate.

1.1.0

Release notes: April 20, 2024

Features and Improvements

  • Insights Test/Test Suites:
    Insights Test/Test Suites enables comprehensive validation of customer’s machine learning models and data via a suite of test and test suites for various types of use cases such as :
    • Data Integrity

    • Data Quality

    • Model Performance (Classification, Regression)

    • Drift

    • Correlation, etc.

    They provide a structured / easier way to add thresholds on metrics. This can be used for Notifications and alerts for continuous Model Monitoring allowing them to take remediative actions.

  • Bias and Fairness: This release introduces a new Insights metric group for “Bias and Fairness Detection” with a new feature metric “Class Imbalance”. Class Imbalance metric measures any under-representation of sensitive groups in a categorical feature.

  • Data Source: This release introduces a new Data Source ObjectStorageFileSearchDataSource. ObjectStorageFileSearchDataSource retrieves file locations based on an OCI file path string or list of OCI file path strings and filters arguments provided by user from OCI Object storage . Various filter options are made available to filter out the file locations based on the file path prefix, file path suffix, last modified date, date in file path string and folder names containing string .

  • Post-Processor Component: This release introduces a new is_critical argument that can be passed to a post processor. When set to true, Insights run is marked as failed when the post processor execution fails. By default the flag is set to False.

  • Metrics: This release introduces the following new metrics:
    • IsPositive: Computes whether the numerical feature has all positive values

    • IsNegative: Computes whether the numerical feature has all negative values

    • IsNonZero: Computes whether the numerical feature has all non-zero value

    • Percentiles: Computes user-provided percentiles for a numerical feature

  • Upgraded the pyarrow dependency to 14.0.1

Bug fixes
  • Fixed a bug to improve the error message when Dask installation/dependencies have issues.

  • Fixed a bug where Classification metrics were not working for integer and float values in a feature of type TARGET or PREDICTION

Breaking changes

No breaking changes