mlm_insights.core.transformers.interfaces package¶
Submodules¶
mlm_insights.core.transformers.interfaces.transformer module¶
- class mlm_insights.core.transformers.interfaces.transformer.Transformer¶
Bases:
ABC
Abstract Base Class for defining an Insights Transformer.
Write a custom transformer to modify the input data frame. For eg: Adding a new column to the input data frame, deriving new columns from the existing data frame, encoding the columns, flattening the columns, or removing unused columns.
Multiple transformers can be chained to operate on the input data frame and produce a final data frame.
Derived class must implement the abstract methods: create, transform and get_output_schema
- abstract classmethod create(config: Any) Transformer ¶
Factory Method to create a transformer. The configuration will be available in config.
Returns¶
- Transformer
An Instance of Transformer.
- get_output_schema(input_schema: Schema, **kwargs: Any) Schema ¶
Override this method if the transform method changes the schema of the input data frame for eg: add/remove columns. Users need familiarity with PyArrow Schema API. Refer here: https://arrow.apache.org/docs/python/generated/pyarrow.Schema.html
Suppose one is adding a column of type float to an input Dataframe. The function can look as below:
def get_output_schema(input_schema: pyarrow.Schema, **kwargs) -> pyarrow.Schema: return input_schema.append(pa.field('column_name', pyarrow.float()))
Parameters¶
- input_schema: pa.Schema
Schema of the input data frame
Returns¶
- output_schema: pa.Schema
Modified schema
- abstract transform(data_frame: DataFrame, **kwargs: Any) DataFrame ¶
Transform the input data frame to produce a new data frame with the modifications. If this method changes the schema of the data_frame, you must override get_output_schema as well to return the modified schema
Parameters¶
- data_frame: pd.DataFrame
Data Frame to operate on
Returns¶
- output_data_frame: pd.DataFrame
Transformed Data Frame