mlm_insights.core.metrics.bias_and_fairness package¶
Submodules¶
mlm_insights.core.metrics.bias_and_fairness.class_imbalance module¶
- class mlm_insights.core.metrics.bias_and_fairness.class_imbalance.ClassImbalance(config: ~typing.Dict[str, ~mlm_insights.constants.definitions.ConfigParameter] = <factory>, feature_values_or_threshold: ~typing.List[str] = <factory>, drop_nan_values: bool = False, feature_value_count: int = 0, total: int = 0, nan_count: int = 0)¶
Bases:
MetricBase
Class imbalance (CI) bias occurs when a feature (also referred to as facet) value d has fewer training samples when compared with another feature value a in the dataset.
The formula for the class imbalance measure is:
CI = (na - nd)/(na + nd)
where,
na = the number of members of insensitive group (feature_values_or_threshold)
nd = the number of members of sensitive group
Its values range over the interval [-1, 1]. Value close to 0 indicates balanced feature Negative value indicates under-representation of the feature_values_or_threshold group Positive value indicates reverse bias i.e towards insensitive group
Configuration¶
- feature_values_or_threshold: List[str]
list of categorical values present in the dataset for the given feature
- drop_nan_valuesboolean
flag to exclude the nan values while calculating class imbalance
Exceptions¶
- InvalidParameterException
if the feature_values_or_threshold group is not present
if the feature_values_or_threshold group list is empty
Limitations¶
Currently support list of categorical values for a given feature
Returns¶
- class_imbalance_value: float
Class Imbalance value .
- feature_values_or_threshold: List[str]
list of categorical values present in the dataset for the given feature
Examples
import pandas as pd from mlm_insights.builder.builder_component import MetricDetail, EngineDetail from mlm_insights.builder.insights_builder import InsightsBuilder from mlm_insights.constants.types import FeatureType, DataType, VariableType from mlm_insights.core.metrics.bias_and_fairness.class_imbalance import ClassImbalance, CONFIG_KEY_FOR_FEATURE_VALUES_OR_THRESHOLD, CLASS_IMBALANCE_VALUE from mlm_insights.core.metrics.metric_metadata import MetricMetadata def main(): input_schema = { 'transport': FeatureType(data_type=DataType.STRING, variable_type=VariableType.NOMINAL), 'gender': FeatureType(data_type=DataType.STRING, variable_type=VariableType.NOMINAL) } data_frame = pd.DataFrame({'transport': ['bus', 'bus', 'train', 'walk', 'walk', 'car', 'car'], 'gender': ['M', 'M', 'F', 'F', 'M', 'M', 'F']}) metric_details = MetricDetail(univariate_metric={"gender": [ MetricMetadata(klass=ClassImbalance, config={CONFIG_KEY_FOR_FEATURE_VALUES_OR_THRESHOLD: ['F']}) ]}, dataset_metrics=[]) runner = InsightsBuilder(). with_input_schema(input_schema). with_data_frame(data_frame=data_frame). with_metrics(metrics=metric_details). with_engine(engine=EngineDetail(engine_name='native')). build() run_result = runner.run() profile = run_result.profile profile_json = profile.get_feature('gender').get_metric( MetricMetadata(klass=ClassImbalance, config={CONFIG_KEY_FOR_FEATURE_VALUES_OR_THRESHOLD: ["F"]}) ).get_result() print(round(profile_json[CLASS_IMBALANCE_VALUE], 1)) # -0.1 if __name__ == '__main__': main() Returns the standard metric result as: { 'metric_name': 'ClassImbalance', 'metric_description': 'Class Imbalance Bias', 'variable_count': 2, 'variable_names': [class_imbalance_value], 'variable_types': [CONTINUOUS,NOMINAL], 'variable_dtypes':[FLOAT,STRING], 'variable_dimensions': [1,2], 'metric_data': [<float value>], 'metadata': {}, 'error': None }
- compute(column: Series, **kwargs: Any) None ¶
Computes the class imbalance of the given features based on the feature_values_or_threshold group value
Parameters¶
- columnpd.Series
Input column.
- compute_ci() float ¶
- classmethod create(config: Dict[str, ConfigParameter] | None = None) ClassImbalance ¶
Factory Method to create an object. The configuration will be available in config.
Returns¶
- Count
An Instance of Class Imbalance.
- drop_nan_values: bool = False¶
- feature_value_count: int = 0¶
- feature_values_or_threshold: List[str]¶
- get_result(**kwargs: Any) Dict[str, Any] ¶
Returns class imbalance of input data.
Returns¶
dict[str:float]: class imbalance of the data. {“class_imbalance_value”: ci value, “feature_values_or_threshold”:[‘value’]}
- get_standard_metric_result(**kwargs: Any) StandardMetricResult ¶
Returns Standard Metric for class imbalance.
Returns¶
StandardMetricResult: class imbalance Metric in standard format.
- merge(other_metric: ClassImbalance, **kwargs: Any) ClassImbalance ¶
Merge two ClassImbalance metrics into one, without mutating the others.
Parameters¶
- other_metricClassImbalance
Other ClassImbalance metric that needs to be merged.
Returns¶
- ClassImbalance
A new instance of ClassImbalance containing insensitive group list,number of insensitive count, drop_nan_values, nan_count, feature_value_count and total_count imbalance value after merging.
- nan_count: int = 0¶
- total: int = 0¶