Builder Object¶
Builder Object provides a core set of APIs, using with which the user can set the behavior of their monitoring. By selecting what components and variants to run, all aspects of the monitoring task can be customised and configured. It is recommended to carefully go over each API the builder object provides to get a good understanding of the framework.
The builder object also provides an easy way to figure out what different components it supports (through its API) or the interface types for each component. We will go over each API the object provides in detail. For details on each of the components please refer to the dedicated sections.
How to use it¶
Below, we have described the steps the user needs to take to create their component and pass it on to the builder.
Import the right class from Ml Insight library to your init code
Construct the implementation of the component you want to use with the right parameters.
If the builder expects a list for the constructed object, create and insert the object(s) in a list.
Pass the constructed object or list to corresponding builder API.
The next sections talk about what builder APIs to pass specific objects.
Hint
To know more about how to build specific components please refer the dedicated page for that component.
To view a tutorial please check the tutorials page
Providing Schema¶
The schema is a mandatory input (as of now) to the framework. This provides the necessary metadata to the framework, for it to decide what columns to use for computation, the type of each column (whether input, output, or target), the data type (for example, int, string), and the variable type (continuous or categorical).
The schema also helps the framework to decide what type of metric to inject automatically, if a metric list is not passed by the customer.
with_input_schema(self, input_schema: Dict[str, FeatureType]) -> "InsightsBuilder"
User can also provide a sample dataset from which an approximate input schema can be extracted. A list of target, prediction and prediction score features can be provided by the user to map the columns correctly for computation. The dataset_location parameter is the location of sample dataset. It supports local file storage url only.
with_input_schema_using_dataset(self, dataset_location: str, target_features: List[str],
prediction_features: List[str],
prediction_score_features: List[str]) -> "InsightsBuilder"
Note
While this is a mandatory field as of now for all input types, in subsequent releases this may become an optional field. The metadata is be decided based on the information available from the input data (for example, for strongly typed formats) or using an Auto Schema Detection strategy.
More details on the feature type can be found on the schema page of the documentation.
Providing Input Data¶
The input data is a mandatory parameter that must be passed on to the framework as input data is the data to be processed and evaluated. Input data can be provided in multiple ways, supporting multiple formats and storage options.
The first option is to provide a data reader component. A data reader component encapsulates all the logic needed to read a specific format from a specific storage option. The framework uses the DataReader interface to pull the data in a standardised format to be further processed.
Note
When implementing a custom reader is recommended to tackle a specific format by a single reader. e.g. If user has to write a custom csv a custom parquet reader, we recommend writing 2 readers for each variation. This design choice is followed by out of the box readers as well.
Note
Data readers are execution engine aware. So each supported execution engine has its own versions of the readers.
with_reader(self, reader: DataReader) -> "InsightsBuilder":
The second API that is available to pass on the input data is the data frame API. This API takes in the dataframe (or the variants like spark data frame or dask data frame). This is often useful if the library is to be embedded within an existing application where the data frame is already created.
with_data_frame(self, data_frame: Any) -> "InsightsBuilder":
Providing In-memory data transformation¶
The framework reads input data into a data frame and provides ways to transform the data in-memory. Users can run multiple transformations, for example, to sanitise their data or normalise a specific column. Each of these transformers are run in the order they are provided in the list.
with_transformers(self, transformers: List[Transformer]) -> "InsightsBuilder":
Note
Transformers are meant to be used only for simple transformations in memory. They should not be used to persist data back to any storage. By design, transformers are supposed to work on a small chunk of the overall data, so don’t run a group by on the entire data set.
One important transformer to note here is the conditional feature transformer. We will discuss in details on the specific sub page.
Defining metrics for features¶
The metrics API provides a way for users to explicitly define which metrics should be calculated for specific feature(s). Metrics come in two types within the framework, univariate metrics and data set metrics. Univariate metrics are defined for a specific single feature, while data set metrics can take multiple features of different column types, variable types, or data types, where certain variations might be mandatory (for example, confusion matrix always expects prediction and target column types).
with_metrics(self, metrics: MetricDetail) -> "InsightsBuilder":
Taking post actions¶
Post processors are actions that can be run after the entire data has been processed and all the metrics have been calculated. Any type of logic can implemented here, for example, writing the metric result to storage, calling the API of any OCI service, or providing integration with any other tools (like grafana).
Post processors don’t have access to the raw data. They only have access to outputs of the framework like, profile (metric result output) and test results.
Please see section Post Processor Component for details.
with_post_processors(self, post_processors: List[PostProcessor]) -> "InsightsBuilder":
Providing what Execution Engine to Run on¶
Insights runs on different execution engines such as Native Pandas, Dask and most of its components can be run in an execution engine agnostic manner. For eg the Insights Metrics are written once, and the same code can run unmodified on different execution engines. Examples include, the same code being written by providing a native pandas, running on a Jupyter Notebook, or on ML Jobs using Dask with parallelization options or on ML flow using Spark.
with_engine(self, engine: EngineDetail) -> "InsightsBuilder":
Note
Starting from Insights version >= 1.2.0, EngineDetail has been enhanced to take a custom Dask client via the engine_client property
Declaring additional metadata¶
It is also possible to declare additional metadata and pass it on to the framework. This and the profile are persisted (if the profile is persisted). This can be provided as a free form key value pairs and are called Tags.
with_tags(self, tags: Tags) -> "InsightsBuilder":
Defining Insights Test Config¶
Insights Tests/Test Suite feature enables comprehensive validation of customer’s machine learning models and data. Provides a comprehensive suite of test and test suites for various types of use cases such as:
Data Integrity
Data Quality
Model Performance (Classification, Regression)
Drift
Correlation, etc
Please see section Test/Test Suites Component for details.
with_test_config(self, test_config: TestConfig) -> "InsightsBuilder":
Note
This feature is available on Insights versions >= 1.2.0
Providing a Reference Profile for Tests¶
Users can pass a Reference/Baseline Profile when doing a Prediction run or when executing Insights Tests. Reference profile can be passed using a profile reader or by passing a profile object.
Please see section Test/Test Suites Component for details.
with_reference_profile(self, profile: Optional[Union[Profile, ProfileReader]]) -> "InsightsBuilder":
Note
This feature is available on Insights versions >= 1.2.0
Creating the runner object¶
Finally, when all the mandatory components and all the optional components the user wants to use have been provided, the build API can be called which returns the runner object. More about the runner object can be read in the next page.
Along with building the runner object, the build API also does validation on each of its components and it checks if all the andatory components have been provided.
Warning
Incorrectly constructing a component or providing insufficient components may cause the build API to raise an error.
build(self) -> Runner: