mlm_insights.core.transformers.interfaces package

Submodules

mlm_insights.core.transformers.interfaces.transformer module

class mlm_insights.core.transformers.interfaces.transformer.Transformer

Bases: ABC

Abstract Base Class for defining an Insights Transformer.

Write a custom transformer to modify the input data frame. For eg: Adding a new column to the input data frame, deriving new columns from the existing data frame, encoding the columns, flattening the columns, or removing unused columns.

Multiple transformers can be chained to operate on the input data frame and produce a final data frame.

Derived class must implement the abstract methods: create, transform and get_output_schema

abstract classmethod create(config: Any) Transformer

Factory Method to create a transformer. The configuration will be available in config.

Returns

Transformer

An Instance of Transformer.

get_output_schema(input_schema: Schema, **kwargs: Any) Schema

Override this method if the transform method changes the schema of the input data frame for eg: add/remove columns. Users need familiarity with PyArrow Schema API. Refer here: https://arrow.apache.org/docs/python/generated/pyarrow.Schema.html

Suppose one is adding a column of type float to an input Dataframe. The function can look as below:

def get_output_schema(input_schema: pyarrow.Schema, **kwargs) -> pyarrow.Schema:
  return input_schema.append(pa.field('column_name', pyarrow.float()))

Parameters

input_schema: pa.Schema

Schema of the input data frame

Returns

output_schema: pa.Schema

Modified schema

abstract transform(data_frame: DataFrame, **kwargs: Any) DataFrame

Transform the input data frame to produce a new data frame with the modifications. If this method changes the schema of the data_frame, you must override get_output_schema as well to return the modified schema

Parameters

data_frame: pd.DataFrame

Data Frame to operate on

Returns

output_data_frame: pd.DataFrame

Transformed Data Frame