mlm_insights.core.metrics.regression_metrics package

Submodules

mlm_insights.core.metrics.regression_metrics.max_error module

class mlm_insights.core.metrics.regression_metrics.max_error.MaxError(config: ~typing.Dict[str, ~mlm_insights.constants.definitions.ConfigParameter] = <factory>, target_column: str = 'y_true', prediction_column: str = 'y_predict', max_of_residual: float = 0.0)

Bases: DatasetMetricBase

MaxError metric computes the maximum residual error. This is a dataset level metric
It is an accurate metric which can process any column type and only numerical (int, float) data types.
This metric falls under regression category, and is used for predictive modeling problems that involve predicting a numeric value or to measure error/performance etc for regression models etc.
Both Ground truth and Prediction target columns should not contain any NaN values otherwise InvalidTargetPredictionException will be thrown

Configuration

None

Parameters

y_true: array-like of shape (n_samples,)

Ground truth (correct) target values.

y_predarray-like of shape (n_samples,)

Estimated target values.

Returns

max_errorfloat

A positive floating point value (the best value is 0.0).

Exceptions

  • MissingRequiredParameterException

  • InvalidTargetPredictionException

Examples

from mlm_insights.builder.builder_component import MetricDetail, EngineDetail
from mlm_insights.builder.insights_builder import InsightsBuilder
from mlm_insights.constants.types import FeatureType, DataType, VariableType, ColumnType
from mlm_insights.core.metrics.regression_metrics.max_error import MaxError
from mlm_insights.core.metrics.metric_metadata import MetricMetadata
import pandas as pd

def main():
    input_schema = {
        'square_feet': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.INPUT),
        'house_price_prediction': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.PREDICTION),
        'house_price_target': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.TARGET)
    }
    data_frame = pd.DataFrame({'square_feet': [11.23, 23.45, 11.23, 45.56, 11.23],
     {'house_price_target': [1, 2, 3, 4, 5],
     'house_price_prediction': [1.1, 2.5, 3.8, 5.1, 4.9]}})
    metric_details = MetricDetail(univariate_metric={},
                                  dataset_metrics=[MetricMetadata(klass=MaxError)])

    runner = InsightsBuilder().                 with_input_schema(input_schema).                 with_data_frame(data_frame=data_frame).                 with_metrics(metrics=metric_details).                 with_engine(engine=EngineDetail(engine_name="native")).                 build()

    profile_json = runner.run().profile.to_json()
    dataset_metrics = profile_json['dataset_metrics']
    print(dataset_metrics["MaxError"])
    # {'value': 1.1}
if __name__ == "__main__":
    main()

Returns the standard metric result as:
{
    'metric_name': 'MaxError',
    'metric_description': 'MaxError metric computes the maximum residual error',
    'variable_count': 1,
    'variable_names': ['max_error'],
    'variable_types': [CONTINUOUS],
    'variable_dtypes': [FLOAT],
    'variable_dimensions': [0],
    'metric_data': [1.1],
    'metadata': {},
    'error': None
}
compute(dataset: DataFrame, **kwargs: Any) None

Computes Max error of residual for the passed in dataset

Parameters

datasetpd.DataFrame

DataFrame object for either the entire dataset for a partition on which a Metric is being computed

classmethod create(config: Dict[str, ConfigParameter] | None = None, **kwargs: Any) MaxError

Create a MaxError metric using the configuration and kwargs

Parameters

configOptional[Dict[str, ConfigParameter]]

Metric configuration

get_result(**kwargs: Any) Dict[str, Any]

Returns the computed value of the metric

Returns

Dict[str, Any]: Dictionary with key as string and value as any metric property.

get_standard_metric_result(**kwargs: Any) StandardMetricResult

This method returns metric output in standard format.

Returns

StandardMetricResult

max_of_residual: float = 0.0
merge(other_metric: MaxError, **kwargs: Any) MaxError

Merge two MaxError into one, without mutating the others.

Parameters

other_metricMaxError

Other MaxError that need be merged.

Returns

MaxError: MaxError

A new instance of MaxError

prediction_column: str = 'y_predict'
target_column: str = 'y_true'

mlm_insights.core.metrics.regression_metrics.mean_absolute_error module

class mlm_insights.core.metrics.regression_metrics.mean_absolute_error.MeanAbsoluteError(config: ~typing.Dict[str, ~mlm_insights.constants.definitions.ConfigParameter] = <factory>, target_column: str = 'y_true', prediction_column: str = 'y_predict', total_count: int = 0, sum_of_residuals: float = 0.0)

Bases: DatasetMetricBase

Computes Mean Absolute Error regression loss. This is a dataset level metric
It is an accurate metric which can process any column type and only numerical (int, float) data types.
This metric falls under regression category, and is used for predictive modeling problems that involve predicting a numeric value or to measure error/performance etc for regression models.
Both Ground truth and Prediction target columns should not contain any NaN values otherwise InvalidTargetPredictionException will be thrown

Configuration

None

Parameters

y_truearray-like of shape (n_samples,) or (n_samples, n_outputs)

Ground truth (correct) target values.

y_predarray-like of shape (n_samples,) or (n_samples, n_outputs)

Estimated target values.

Returns

  • float: Mean Absolute Error

Exceptions

  • MissingRequiredParameterException

  • InvalidTargetPredictionException

Examples

from mlm_insights.builder.builder_component import MetricDetail, EngineDetail
from mlm_insights.builder.insights_builder import InsightsBuilder
from mlm_insights.constants.types import FeatureType, DataType, VariableType, ColumnType
from mlm_insights.core.metrics.regression_metrics.mean_absolute_error import MeanAbsoluteError
from mlm_insights.core.metrics.metric_metadata import MetricMetadata
import pandas as pd


def main():
    input_schema = {
        'square_feet': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.INPUT),
        'house_price_prediction': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.PREDICTION),
        'house_price_target': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.TARGET)
    }
    data_frame = pd.DataFrame({'square_feet': [11.23, 23.45, 11.23, 45.56, 11.23],
     {'house_price_target': [1, 2, 3, 4, 5],
     'house_price_prediction': [1.1, 2.5, 3.8, 5.1, 4.9]}})
    metric_details = MetricDetail(univariate_metric={},
                                  dataset_metrics=[MetricMetadata(klass=MeanAbsoluteError)])

    runner = InsightsBuilder().                 with_input_schema(input_schema).                 with_data_frame(data_frame=data_frame).                 with_metrics(metrics=metric_details).                 with_engine(engine=EngineDetail(engine_name="native")).                 build()

    profile_json = runner.run().profile.to_json()
    dataset_metrics = profile_json['dataset_metrics']
    print(dataset_metrics["MeanAbsoluteError"])
if __name__ == "__main__":
    main()

Returns the standard metric result as:
{
    'metric_name': 'MeanAbsoluteError',
    'metric_description': 'MaxError metric computes the maximum residual error',
    'variable_count': 1,
    'variable_names': ['mean_absolute_error'],
    'variable_types': [CONTINUOUS],
    'variable_dtypes': [FLOAT],
    'variable_dimensions': [0],
    'metric_data': [1.1],
    'metadata': {},
    'error': None
}
compute(dataset: DataFrame, **kwargs: Any) None

Computes Numerator for the Mean Absolute Error for the passed in dataset

Parameters

datasetpd.DataFrame

DataFrame object for either the entire dataset for a partition on which a Metric is being computed

classmethod create(config: Dict[str, ConfigParameter] | None = None, **kwargs: Any) MeanAbsoluteError

Create a MeanAbsoluteError metric using the configuration and kwargs

Parameters

configOptional[Dict[str, ConfigParameter]]

Metric configuration

features_metadata: FeatureMetadata

Contains input schema for each feature, supplied as a keyword argument

get_result(**kwargs: Any) Dict[str, Any]

Returns the computed value of the metric

Returns

Dict[str, Any]: Dictionary with key as string and value as any metric property.

get_standard_metric_result(**kwargs: Any) StandardMetricResult

This method returns metric output in standard format.

Returns

StandardMetricResult

merge(other_metric: MeanAbsoluteError, **kwargs: Any) MeanAbsoluteError

Merge two MeanAbsoluteError into one, without mutating the others.

Parameters

other_metricMeanAbsoluteError

Other MeanAbsoluteError that need be merged.

Returns

MeanAbsoluteError: MeanAbsoluteError

A new instance of MeanAbsoluteError

prediction_column: str = 'y_predict'
sum_of_residuals: float = 0.0
target_column: str = 'y_true'
total_count: int = 0

mlm_insights.core.metrics.regression_metrics.mean_absolute_percentage_error module

class mlm_insights.core.metrics.regression_metrics.mean_absolute_percentage_error.MeanAbsolutePercentageError(config: ~typing.Dict[str, ~mlm_insights.constants.definitions.ConfigParameter] = <factory>, target_column: str = 'y_true', prediction_column: str = 'y_predict', total_count: int = 0, sum_of_relative_error: float = 0.0)

Bases: DatasetMetricBase

Mean absolute percentage error (MAPE) regression loss. This is a dataset level metric
It is an accurate metric which can process any column type and only numerical (int, float) data types.
This metric falls under regression category, and is used for predictive modeling problems that involve predicting a numeric value or to measure error/performance etc. for regression models.
Both Ground truth and Prediction target columns should not contain any NaN values otherwise InvalidTargetPredictionException will be thrown

Configuration

None

Parameters

y_truearray-like of shape (n_samples,) or (n_samples, n_outputs)

Ground truth (correct) target values.

y_predarray-like of shape (n_samples,) or (n_samples, n_outputs)

Estimated target values.

Returns

  • float: Mean absolute percentage error (MAPE) regression loss

Exceptions

  • MissingRequiredParameterException

  • InvalidTargetPredictionException

Examples

from mlm_insights.builder.builder_component import MetricDetail, EngineDetail
from mlm_insights.builder.insights_builder import InsightsBuilder
from mlm_insights.constants.types import FeatureType, DataType, VariableType, ColumnType
from mlm_insights.core.metrics.regression_metrics.mean_absolute_percentage_error import MeanAbsolutePercentageError
from mlm_insights.core.metrics.metric_metadata import MetricMetadata
import pandas as pd


def main():
    input_schema = {
        'square_feet': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.INPUT),
        'house_price_prediction': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.PREDICTION),
        'house_price_target': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.TARGET)
    }
    data_frame = pd.DataFrame({'square_feet': [11.23, 23.45, 11.23, 45.56, 11.23],
     {'house_price_target': [1, 2, 3, 4, 5],
     'house_price_prediction': [1.1, 2.5, 3.8, 5.1, 4.9]}})
    metric_details = MetricDetail(univariate_metric={},
                                  dataset_metrics=[MetricMetadata(klass=MeanAbsolutePercentageError)])

    runner = InsightsBuilder().                 with_input_schema(input_schema).                 with_data_frame(data_frame=data_frame).                 with_metrics(metrics=metric_details).                 with_engine(engine=EngineDetail(engine_name="native")).                 build()

    profile_json = runner.run().profile.to_json()
    dataset_metrics = profile_json['dataset_metrics']
    print(dataset_metrics["MeanAbsolutePercentageError"])
if __name__ == "__main__":
    main()

Returns the standard metric result as:
{
    'metric_name': 'MeanAbsolutePercentageError',
    'metric_description': 'Mean absolute percentage error (MAPE) regression loss',
    'variable_count': 1,
    'variable_names': ['mean_absolute_percentage_error'],
    'variable_types': [CONTINUOUS],
    'variable_dtypes': [FLOAT],
    'variable_dimensions': [0],
    'metric_data': [1.1],
    'metadata': {},
    'error': None
}
compute(dataset: DataFrame, **kwargs: Any) None

Computes Sum of Relative Error for the Mean absolute percentage Error for the passed in dataset

Parameters

datasetpd.DataFrame

DataFrame object for either the entire dataset for a partition on which a Metric is being computed

classmethod create(config: Dict[str, ConfigParameter] | None = None, **kwargs: Any) MeanAbsolutePercentageError

Create a MeanAbsolutePercentageError metric using the configuration and kwargs

Parameters

configOptional[Dict[str, ConfigParameter]]

Metric configuration

get_result(**kwargs: Any) Dict[str, Any]

Returns the computed value of the metric

Returns

Dict[str, Any]: Dictionary with key as string and value as any metric property.

get_standard_metric_result(**kwargs: Any) StandardMetricResult

This method returns metric output in standard format.

Returns

StandardMetricResult

merge(other_metric: MeanAbsolutePercentageError, **kwargs: Any) MeanAbsolutePercentageError

Merge two MeanAbsolutePercentageError metrics into one, without mutating the others.

Parameters

other_metricMeanAbsolutePercentageError

Other MeanAbsolutePercentageError that needs be merged.

Returns

MeanAbsolutePercentageError: MeanAbsolutePercentageError

A new instance of MeanAbsolutePercentageError

prediction_column: str = 'y_predict'
sum_of_relative_error: float = 0.0
target_column: str = 'y_true'
total_count: int = 0

mlm_insights.core.metrics.regression_metrics.mean_squared_error module

class mlm_insights.core.metrics.regression_metrics.mean_squared_error.MeanSquaredError(config: ~typing.Dict[str, ~mlm_insights.constants.definitions.ConfigParameter] = <factory>, target_column: str = 'y_true', prediction_column: str = 'y_predict', total_count: int = 0, sum_of_squared_residuals: float = 0.0)

Bases: DatasetMetricBase

Computes Mean Squared Error regression loss. This is a dataset level metric
It is an accurate metric which can process any column type and only numerical (int, float) data types.
This metric falls under regression category, and is used for predictive modeling problems that involve predicting a numeric value or to measure error/performance etc. for regression models.
Both Ground truth and Prediction target columns should not contain any NaN values otherwise InvalidTargetPredictionException will be thrown

Configuration

None

Parameters

y_truearray-like of shape (n_samples,) or (n_samples, n_outputs)

Ground truth (correct) target values.

y_predarray-like of shape (n_samples,) or (n_samples, n_outputs)

Estimated target values.

Returns

  • float: Mean Squared Error

Exceptions

  • MissingRequiredParameterException

  • InvalidTargetPredictionException

Examples

from mlm_insights.builder.builder_component import MetricDetail, EngineDetail
from mlm_insights.builder.insights_builder import InsightsBuilder
from mlm_insights.constants.types import FeatureType, DataType, VariableType, ColumnType
from mlm_insights.core.metrics.regression_metrics.mean_squared_error import MeanSquaredError
from mlm_insights.core.metrics.metric_metadata import MetricMetadata
import pandas as pd


def main():
    input_schema = {
        'square_feet': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.INPUT),
        'house_price_prediction': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.PREDICTION),
        'house_price_target': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.TARGET)
    }
    data_frame = pd.DataFrame({'square_feet': [11.23, 23.45, 11.23, 45.56, 11.23],
     {'house_price_target': [1, 2, 3, 4, 5],
     'house_price_prediction': [1.1, 2.5, 3.8, 5.1, 4.9]}})
    metric_details = MetricDetail(univariate_metric={},
                                  dataset_metrics=[MetricMetadata(klass=MeanSquaredError)])

    runner = InsightsBuilder().                 with_input_schema(input_schema).                 with_data_frame(data_frame=data_frame).                 with_metrics(metrics=metric_details).                 with_engine(engine=EngineDetail(engine_name="native")).                 build()

    profile_json = runner.run().profile.to_json()
    dataset_metrics = profile_json['dataset_metrics']
    print(dataset_metrics["MeanSquaredError"])
if __name__ == "__main__":
    main()

Returns the standard metric result as:
{
    'metric_name': 'MeanSquaredError',
    'metric_description': 'Computes Mean Squared Error regression loss',
    'variable_count': 1,
    'variable_names': ['mean_squared_error'],
    'variable_types': [CONTINUOUS],
    'variable_dtypes': [FLOAT],
    'variable_dimensions': [0],
    'metric_data': [1.1],
    'metadata': {},
    'error': None
}
compute(dataset: DataFrame, **kwargs: Any) None

Computes Numerator for the Mean MeanSquaredError Error for the passed in dataset

Parameters

datasetpd.DataFrame

DataFrame object for either the entire dataset for a partition on which a Metric is being computed

classmethod create(config: Dict[str, ConfigParameter] | None = None, **kwargs: Any) MeanSquaredError

Create a MeanSquaredError metric using the configuration and kwargs

Parameters

configOptional[Dict[str, ConfigParameter]]

Metric configuration

get_result(**kwargs: Any) Dict[str, Any]

Returns the computed value of the metric

Returns

Dict[str, Any]: Dictionary with key as string and value as any metric property.

get_standard_metric_result(**kwargs: Any) StandardMetricResult

This method returns metric output in standard format.

Returns

StandardMetricResult

merge(other_metric: MeanSquaredError, **kwargs: Any) MeanSquaredError

Merge two MeanSquaredError into one, without mutating the others.

Parameters

other_metricMeanSquaredError

Other MeanSquaredError that need be merged.

Returns

MeanSquaredError: MeanSquaredError

A new instance of MeanSquaredError

prediction_column: str = 'y_predict'
sum_of_squared_residuals: float = 0.0
target_column: str = 'y_true'
total_count: int = 0

mlm_insights.core.metrics.regression_metrics.mean_squared_log_error module

class mlm_insights.core.metrics.regression_metrics.mean_squared_log_error.MeanSquaredLogError(config: ~typing.Dict[str, ~mlm_insights.constants.definitions.ConfigParameter] = <factory>, target_column: str = 'y_true', prediction_column: str = 'y_predict', total_count: int = 0, sum_of_squared_log: float = 0.0)

Bases: DatasetMetricBase

Computes Mean Squared Log Error regression loss. This Metric must be used when both target and prediction values are positive.
This is a dataset level metric. It is an accurate metric which can process any column type and only numerical (int, float) data types.
This metric falls under regression category, and is used for predictive modeling problems that involve predicting a numeric value or to measure error/performance etc. for regression models.
Both Ground truth and Prediction target columns should not contain any NaN values otherwise InvalidTargetPredictionException will be thrown

Configuration

None

Parameters

y_truearray-like of shape (n_samples,) or (n_samples, n_outputs)

Ground truth (correct) target values.

y_predarray-like of shape (n_samples,) or (n_samples, n_outputs)

Estimated target values.

Returns

  • float: Mean Squared Log Error

Exceptions

  • MissingRequiredParameterException

  • InvalidTargetPredictionException

Examples

from mlm_insights.builder.builder_component import MetricDetail, EngineDetail
from mlm_insights.builder.insights_builder import InsightsBuilder
from mlm_insights.constants.types import FeatureType, DataType, VariableType, ColumnType
from mlm_insights.core.metrics.regression_metrics.mean_squared_log_error import MeanSquaredLogError
from mlm_insights.core.metrics.metric_metadata import MetricMetadata
import pandas as pd


def main():
    input_schema = {
        'square_feet': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.INPUT),
        'house_price_prediction': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.PREDICTION),
        'house_price_target': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.TARGET)
    }
    data_frame = pd.DataFrame({'square_feet': [11.23, 23.45, 11.23, 45.56, 11.23],
     {'house_price_target': [1, 2, 3, 4, 5],
     'house_price_prediction': [1.1, 2.5, 3.8, 5.1, 4.9]}})
    metric_details = MetricDetail(univariate_metric={},
                                  dataset_metrics=[MetricMetadata(klass=MeanSquaredLogError)])

    runner = InsightsBuilder().                 with_input_schema(input_schema).                 with_data_frame(data_frame=data_frame).                 with_metrics(metrics=metric_details).                 with_engine(engine=EngineDetail(engine_name="native")).                 build()

    profile_json = runner.run().profile.to_json()
    dataset_metrics = profile_json['dataset_metrics']
    print(dataset_metrics["MeanSquaredLogError"])
if __name__ == "__main__":
    main()

Returns the standard metric result as:
{
    'metric_name': 'MeanSquaredLogError',
    'metric_description': 'Computes Mean Squared Log Error regression loss',
    'variable_count': 1,
    'variable_names': ['mean_squared_log_error'],
    'variable_types': [CONTINUOUS],
    'variable_dtypes': [FLOAT],
    'variable_dimensions': [0],
    'metric_data': [1.1],
    'metadata': {},
    'error': None
}
compute(dataset: DataFrame, **kwargs: Any) None

Computes Numerator for the MeanSquaredLogError for the passed in dataset

Parameters

datasetpd.DataFrame

DataFrame object for either the entire dataset for a partition on which a Metric is being computed

classmethod create(config: Dict[str, ConfigParameter] | None = None, **kwargs: Any) MeanSquaredLogError

Create a MeanSquaredLogError metric using the configuration and kwargs

Parameters

configOptional[Dict[str, ConfigParameter]]

Metric configuration

get_result(**kwargs: Any) Dict[str, Any]

Returns the computed value of the metric

Returns

Dict[str, Any]: Dictionary with key as string and value as any metric property.

get_standard_metric_result(**kwargs: Any) StandardMetricResult

This method returns metric output in standard format.

Returns

StandardMetricResult

merge(other_metric: MeanSquaredLogError, **kwargs: Any) MeanSquaredLogError

Merge two MeanSquaredLogError into one, without mutating the others.

Parameters

other_metricMeanSquaredLogError

Other MeanSquaredLogError that need be merged.

Returns

MeanSquaredLogErrorMeanSquaredLogError

A new instance of MeanSquaredLogError

prediction_column: str = 'y_predict'
sum_of_squared_log: float = 0.0
target_column: str = 'y_true'
total_count: int = 0

mlm_insights.core.metrics.regression_metrics.r2_score module

class mlm_insights.core.metrics.regression_metrics.r2_score.R2Score(config: ~typing.Dict[str, ~mlm_insights.constants.definitions.ConfigParameter] = <factory>, target_column: str = 'y_true', prediction_column: str = 'y_predict', total_count: int = 0, sum_of_squared_residuals: float = 0.0)

Bases: DatasetMetricBase

Computes R2-Score between target and prediction columns.
In the particular case when target is constant, the :math: R2 score is not finite: it is either NaN (perfect predictions) or -Inf (imperfect predictions). By default, these cases are replaced with 1.0 (perfect predictions) or 0.0 (imperfect predictions) respectively.
This is a dataset level and accurate metric which can process any column type and only numerical (int, float) data types.
This metric falls under regression category, and is used for predictive modeling problems that involve predicting a numeric value or to measure error/performance etc. for regression models.
Both Ground truth and Prediction target columns should not contain any NaN values otherwise InvalidTargetPredictionException will be thrown

Configuration

None

Parameters

y_truearray-like of shape (n_samples,) or (n_samples, n_outputs)

Ground truth (correct) target values.

y_predarray-like of shape (n_samples,) or (n_samples, n_outputs)

Estimated target values.

Returns

  • float: R2 score

Exceptions

  • MissingRequiredParameterException

  • InvalidTargetPredictionException

Examples

from mlm_insights.builder.builder_component import MetricDetail, EngineDetail
from mlm_insights.builder.insights_builder import InsightsBuilder
from mlm_insights.constants.types import FeatureType, DataType, VariableType, ColumnType
from mlm_insights.core.metrics.regression_metrics.r2_score import R2Score
from mlm_insights.core.metrics.metric_metadata import MetricMetadata
import pandas as pd


def main():
    input_schema = {
        'square_feet': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.INPUT),
        'house_price_prediction': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.PREDICTION),
        'house_price_target': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.TARGET)
    }
    data_frame = pd.DataFrame({'square_feet': [11.23, 23.45, 11.23, 45.56, 11.23],
     {'house_price_target': [1, 2, 3, 4, 5],
     'house_price_prediction': [1.1, 2.5, 3.8, 5.1, 4.9]}})
    metric_details = MetricDetail(univariate_metric={},
                                  dataset_metrics=[MetricMetadata(klass=R2Score)])

    runner = InsightsBuilder().                 with_input_schema(input_schema).                 with_data_frame(data_frame=data_frame).                 with_metrics(metrics=metric_details).                 with_engine(engine=EngineDetail(engine_name="native")).                 build()

    profile_json = runner.run().profile.to_json()
    dataset_metrics = profile_json['dataset_metrics']
    print(dataset_metrics["R2Score"])
if __name__ == "__main__":
    main()

Returns the standard metric result as:
{
    'metric_name': 'R2Score',
    'metric_description': 'Computes R2-Score between target and prediction columns',
    'variable_count': 1,
    'variable_names': ['r2_score'],
    'variable_types': [CONTINUOUS],
    'variable_dtypes': [FLOAT],
    'variable_dimensions': [0],
    'metric_data': [1.1],
    'metadata': {},
    'error': None
}
compute(dataset: DataFrame, **kwargs: Any) None

Computes Numerator for the Mean R2Score Error for the passed in dataset

Parameters

datasetpd.DataFrame

DataFrame object for either the entire dataset for a partition on which a Metric is being computed

classmethod create(config: Dict[str, ConfigParameter] | None = None, **kwargs: Any) R2Score

Factory Method to create an object. The configuration will be available in config.

Returns

MetricBase

An Instance of MetricBase.

get_required_shareable_feature_components(**kwargs: Any) Dict[str, List[SFCMetaData]]

Returns the Shareable Feature Components that a Metric requires to compute its state and values Metrics which do not require SFC need not override this property

Returns

Dict where feature_name as key and List of SFCMetadata as value. Each SFCMetadata must contain the klass attribute which points to the SFC class

get_result(**kwargs: Any) Dict[str, Any]

Returns the computed value of the metric

Returns

Dict[str, Any]: Dictionary with key as string and value as any metric property.

get_standard_metric_result(**kwargs: Any) StandardMetricResult

This method returns metric output in standard format.

Returns

StandardMetricResult

merge(other_metric: R2Score, **kwargs: Any) R2Score

Merge two R2Score into one, without mutating the others.

Parameters

other_metricR2Score

Other R2Score that need be merged.

Returns

R2Score: R2Score

A new instance of R2Score

prediction_column: str = 'y_predict'
sum_of_squared_residuals: float = 0.0
target_column: str = 'y_true'
total_count: int = 0

mlm_insights.core.metrics.regression_metrics.root_mean_squared_error module

class mlm_insights.core.metrics.regression_metrics.root_mean_squared_error.RootMeanSquaredError(config: ~typing.Dict[str, ~mlm_insights.constants.definitions.ConfigParameter] = <factory>, target_column: str = 'y_true', prediction_column: str = 'y_predict', prediction_score_column: str = 'y_score', total_count: int = 0, sum_of_squared_residuals: float = 0.0)

Bases: DatasetMetricBase

Computes Root Mean Square Error regression loss. This is a dataset level metric
It is an accurate metric which can process any column type and only numerical (int, float) data types.
This metric falls under regression category, and is used for predictive modeling problems that involve predicting a numeric value or to measure error/performance etc. for regression models.
Both Ground truth and Prediction target columns should not contain any NaN values otherwise InvalidTargetPredictionException will be thrown

Configuration

None

Parameters

y_truearray-like of shape (n_samples,) or (n_samples, n_outputs)

Ground truth (correct) target values.

y_predarray-like of shape (n_samples,) or (n_samples, n_outputs)

Estimated target values.

Returns

  • float: Root Mean Squared Error

Exceptions

  • MissingRequiredParameterException

  • InvalidTargetPredictionException

Examples

from mlm_insights.builder.builder_component import MetricDetail, EngineDetail
from mlm_insights.builder.insights_builder import InsightsBuilder
from mlm_insights.constants.types import FeatureType, DataType, VariableType, ColumnType
from mlm_insights.core.metrics.regression_metrics.root_mean_squared_error import RootMeanSquaredError
from mlm_insights.core.metrics.metric_metadata import MetricMetadata
import pandas as pd


def main():
    input_schema = {
        'square_feet': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.INPUT),
        'house_price_prediction': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.PREDICTION),
        'house_price_target': FeatureType(
            data_type=DataType.FLOAT,
            variable_type=VariableType.CONTINUOUS,
            column_type=ColumnType.TARGET)
    }
    data_frame = pd.DataFrame({'square_feet': [11.23, 23.45, 11.23, 45.56, 11.23],
     {'house_price_target': [1, 2, 3, 4, 5],
     'house_price_prediction': [1.1, 2.5, 3.8, 5.1, 4.9]}})
    metric_details = MetricDetail(univariate_metric={},
                                  dataset_metrics=[MetricMetadata(klass=RootMeanSquaredError)])

    runner = InsightsBuilder().                 with_input_schema(input_schema).                 with_data_frame(data_frame=data_frame).                 with_metrics(metrics=metric_details).                 with_engine(engine=EngineDetail(engine_name="native")).                 build()

    profile_json = runner.run().profile.to_json()
    dataset_metrics = profile_json['dataset_metrics']
    print(dataset_metrics["RootMeanSquaredError"])
if __name__ == "__main__":
    main()

Returns the standard metric result as:
{
    'metric_name': 'RootMeanSquaredError',
    'metric_description': 'Computes Root Mean Squared Error regression loss',
    'variable_count': 1,
    'variable_names': ['root_mean_squared_error'],
    'variable_types': [CONTINUOUS],
    'variable_dtypes': [FLOAT],
    'variable_dimensions': [0],
    'metric_data': [1.1],
    'metadata': {},
    'error': None
}
compute(dataset: DataFrame, **kwargs: Any) None

Computes Numerator for the Root Mean Square Error for the passed in dataset

Parameters

datasetpd.DataFrame

DataFrame object for either the entire dataset for a partition on which a Metric is being computed

classmethod create(config: Dict[str, ConfigParameter] | None = None, **kwargs: Any) RootMeanSquaredError

Create a RootMeanSquareError metric using the configuration and kwargs

Parameters

configOptional[Dict[str, ConfigParameter]]

Metric configuration

get_result(**kwargs: Any) Dict[str, Any]

Returns the computed value of the metric

Returns

Dict[str, Any]: Dictionary with key as string and value as any metric property.

get_standard_metric_result(**kwargs: Any) StandardMetricResult

This method returns metric output in standard format.

Returns

StandardMetricResult

merge(other_metric: RootMeanSquaredError, **kwargs: Any) RootMeanSquaredError

Merge two RootMeanSquaredError metrics into one, without mutating the others.

Parameters

other_metricRootMeanSquaredError

Other RootMeanSquaredError that needs be merged.

Returns

RootMeanSquaredErrorRootMeanSquaredError

A new instance of RootMeanSquaredError

prediction_column: str = 'y_predict'
prediction_score_column: str = 'y_score'
sum_of_squared_residuals: float = 0.0
target_column: str = 'y_true'
total_count: int = 0