Getting Started¶
ML Insights helps user throughout their Models lifecycle, starting from conception all the way to post-production monitoring. Insights does this through a component-based architecture, which makes the library easy to use, highly customizable and reusable. All of the components are also interface-based making them easy to extend.
This section introduces the components available to the user. This section also discusses the different responsibility each of these components serve and the basics of how to use them. This section provides a general overview, and for more details please see the sub-documents.
Builder Object¶
At the very top layer of the framework is the builder object. The responsibility of this object is to accumulate different components from the user and run validation logic, to check if all mandatory components are provided and to perform component level validation.
At a minimum, the builder expects a reader component or a pre-built dataframe object to ingest the data to be evaluated. It also expects (as of the current version) the schema of the data. Each component described here can be passed to the builder object. After all the mandatory and optional components have been provided, the build API is called. If there are no errors, the API returns the runner object.
The builder object expects only component interfaces to be passed on. Hence, any custom implementation can be passed safely to the builder as long as it implements the interface correctly.
Runner Object¶
The runner component represents the core of the framework which is responsible for holding the execution contracts of all its child components. While each of the child components can be passed as an interface, with any implementation logic, the order in which they run, along with when they run, or when they are initialized or destroyed, is handled by the runner component. In essence, the runner object controls the entire life cycle of each of the child components passed through the builder.
The runner object also handles other responsibilities, like thread pool management and execution engine abstraction.
Schema¶
A schema defines the structure and metadata of the input data, which includes the data type, column type, or column mappings. As of the current version, this is mandatory information that has to be sent to the framework. The framework, uses this information as the ground truth and any deviation in the actual data is taken as an anomaly. The framework usually ignores such anomalies in the data.
Note
In the upcoming versions, the schema becomes an optional parameter and users can chose to use other strategies like auto schema inferencing.
Reader Component¶
The first component is the Reader component. The reader allows for ingestion of raw data into the framework. This component is primarily responsible for understanding different formats of data (for example, jsonl, csv) and how to properly read them. Tthe primary responsibility of this component is that, given a set of valid file locations which represents file of a specific type, the reader can properly decode the content and load it in memory.
Also, the Data Source component is an optional subcomponent, which is usually used along side the reader. It is responsible for fetching the list of files location to read from.
Data Source Component¶
The Data Source component is responsible for interacting with a specific data source and returning a list of locations to be read. For example, if Insights needs to fetch data from OCI Object Storage, an ObjectStorageFileSearchDataSource is used which returns a list of objects in a specific bucket.
The end result for the component is a list of URLs.
Profile Reader Component¶
The Profile Reader component is responsible for reading Insights Profile and returning the deserialized Profile. It can read from the local file system and OCI Object Storage locations. Users typically use Profile Reader to load a Reference or Baseline Profile when doing a Prediction run or when executing Insights Tests.
Config Writer Component¶
The Config Writer component lets users convert the builder object into a configuration JSON. By providing the optional parameters of monitor id and storage details, you can generate the configuration JSON needed for the ml-monitoring application. This configuration can be saved to object storage. This component helps you in authoring configurations for the mlm-insights library and ml-monitoring application, and guarantees that configurations have components in with the correct parameters and values.
Transformer Component¶
A transformers is an optional component that can be used to modify, normalize, or extend the input data frame. Typically, transformers are used to do data formatting or normalization before the data frame is sent over for the computation of metrics. Some examples of use cases for transformers are:
Adding a new column to the input data frame based on existing columns. This can also be used to convert unstructured data to structured one. Verify specific columns are present in the data frame. You can chain multiple transformers to operate on the input data frame and produce a final data frame. You can write your own transformer if you have the knowledge by extending the transformer interface and writing custom logic for doing transformations on your data.
Metric Component¶
The metric component is the core construct for the framework. This component is responsible for calculating all statistical metrics and algorithms. Metric components work based on the type of features (for example, input feature or output feature) available, their data type (for example, int, float, or string), and any extra context (for example, if any previous computation is available to compare against). ML Insights provides commonly used metrics out of the box for different ML observability use cases.
Post Processor Component¶
Post Processor is a flexible component that can extend the framework in different ways. The most common use case of post processor is to provide additional integration points. For example, post processors can be used to write metric set output to a storage system. However, given their open ended nature, they can be used for any scenarios, like notification or alerts.
Post processors are a set of actions that rely on the metric set output of the framework. There is no limitation on the kind of action these components can take. So they can call any third-party service or process metric set output. Post processor don’t have access to the raw data, so cannot manipulate the raw data in any way (including writing it to any storage).
Tests/Test Suites¶
The Insights Tests and Test Suite feature enables comprehensive validation of your machine learning models and data. It provides a comprehensive suite of test and test suites for various types of use cases such as:
Data Integrity
Data Quality
Model Performance (Classification, Regression)
Drift
Correlation, etc
You can author the tests using ML Insights Configuration (JSON) or Python-based APIs. Test Results can be consumed for sending alerts using OCI Monitoring letting you do continuous ML monitoring.
Note
This feature is available on Insights versions >= 1.1.0
- Builder Object
- How to use it
- Providing Schema
- Providing Input Data
- Providing In-memory data transformation
- Defining metrics for features
- Taking post actions
- Providing what Execution Engine to Run on
- Declaring additional metadata
- Defining Insights Test Config
- Providing a Reference Profile for Tests
- Creating the runner object
- Runner Object
- Config Writer Component
- Data Reader Component
- Data Source Component
- Profile Reader Component
- Transformer Component
- Metric Component
- Post Processor Component
- Test/Test Suites Component