Fairness
Metrics
AutoMLx provides metrics dedicated to assessing and measuring the fairness of either a model or a dataset. The provided metrics all correspond to different notions of fairness, which the user should carefully select from while taking into account their problem’s specificities.
For maximal versatility, all supported metrics are offered under two formats:
-
A scikit-learn-like
Scorer
object which can be initialized and reused to test different models or datasets. -
A functional interface which can easily be used for one-line computations.
Evaluating a Model
Statistical Parity
- class automl.fairness.metrics.model. ModelStatisticalParityScorer ( protected_attributes , distance_measure = 'diff' , reduction = 'mean' )
-
Measures the statistical parity of a model’s output between subgroups and the rest of the population.
For each subgroup, statistical parity is computed as the likelihood of a positive prediction if the instance is in the subgroup against if it is in the rest of the population.
Statistical Parity (also known as Base Rate or Disparate Impact) is calculated as PP / N, where PP and N are the number of Positive Predictions and total Number of predictions made, respectively.
- Perfect score
-
A perfect score for this metric means that the model does not predict positively any of the subgroups at a different rate than it does for the rest of the population. Perfect values are:
-
1 if using
'ratio'
asdistance_measure
. -
0 if using
'diff'
asdistance_measure
.
-
- Parameters
-
-
protected_attributes ( pandas.Series , numpy.ndarray , list , str ) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
-
distance_measure (
'ratio'
or'diff'
, optional ) –Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
-
'ratio'
: Uses(subgroup_val / rest_of_pop_val)
. Inverted to always be >= 1 if needed. -
'diff'
: Uses| subgroup_val - rest_of_pop_val |
.
Default is
'diff'
. -
-
reduction (
'max'
,'mean'
orNone
, optional ) –Determines how to reduce scores on all subgroups to a single output. Possible values are:
-
'max'
: Returns the maximal value among all subgroup metrics. -
'mean'
: Returns the mean over all subgroup metrics. -
None
: Returns a{subgroup: subgroup_metric, ...}
dict.
Default is
'mean'
. -
-
Examples
from automl.fairness.metrics import ModelStatisticalParityScorer scorer = ModelStatisticalParityScorer(['race', 'sex']) scorer(model, X, y_true)
This metric does not require y_true . It can also be called using
scorer(model, X)
- __call__ ( model , X , y_true = None , supplementary_features = None )
-
Computes the metric using a model’s predictions on a given array of instances
X
.- Parameters
-
-
model ( object ) – Object that implements a predict(X) function to collect categorical predictions.
-
X ( pandas.DataFrame ) – Array of instances to compute the metric on.
-
y_true (pandas.Series, numpy.ndarray, list, optional ) – Array of groundtruth labels. Default is
None
. -
supplementary_features (pandas.DataFrame, optional ) – Array of supplementary features for each instance. Used in case one attribute in
self.protected_attributes
is not contained byX
(e.g. if the protected attribute is not used by the model). Default isNone
, equivalent to empty DataFrame
-
- Returns
-
The computed metric value, with format according to
self.reduction
. - Return type
- Raises
-
AutoMLxValueError –
-
if a feature is present in both
X
andsupplementary_features
.
-
- automl.fairness.metrics.model. model_statistical_parity ( y_true = None , y_pred = None , subgroups = None , distance_measure = 'diff' , reduction = 'mean' )
-
Measures the statistical parity of a model’s output between subgroups and the rest of the population.
For more details on the computation of this metric, refer to
ModelStatisticalParityScorer
.- Parameters
-
-
y_true ( pandas.Series , numpy.ndarray , list ) – Array of groundtruth labels.
-
y_pred ( pandas.Series , numpy.ndarray , list ) – Array of model predictions.
-
subgroups ( pandas.DataFrame ) – Dataframe containing protected attributes for each instance.
-
distance_measure (
'ratio'
or'diff'
, optional ) –Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
-
'ratio'
: Uses(subgroup_val / rest_of_pop_val)
. Inverted to always be >= 1 if needed. -
'diff'
: Uses| subgroup_val - rest_of_pop_val |
.
Default is
'diff'
. -
-
reduction (
'max'
,'mean'
orNone
, optional ) –Determines how to reduce scores on all subgroups to a single output. Possible values are:
-
'max'
: Returns the maximal value among all subgroup metrics. -
'mean'
: Returns the mean over all subgroup metrics. -
None
: Returns a{subgroup: subgroup_metric, ...}
dict.
Default is
'mean'
. -
-
- Returns
-
The computed metric value, with format according to reduction .
- Return type
Examples
from automl.fairness.metrics import model_statistical_parity subgroups = X[['race', 'sex']] model_statistical_parity(y_true, y_pred, subgroups)
This metric does not require y_true . It can also be called using
model_statistical_parity(None, y_pred, subgroups) model_statistical_parity(y_pred=y_pred, subgroups=subgroups)
True Positive Rate Disparity
- class automl.fairness.metrics.model. TruePositiveRateScorer ( protected_attributes , distance_measure = 'diff' , reduction = 'mean' )
-
Measures the disparity of a model’s true positive rate between subgroups and the rest of the population.
For each subgroup, the disparity is measured by comparing the true positive rate on instances of a subgroup against the rest of the population.
True Positive Rate (also known as TPR, recall, or sensitivity) is calculated as TP / (TP + FN), where TP and FN are the number of true positives and false negatives, respectively.
- Perfect score
-
A perfect score for this metric means that the model does not correctly predict the positive class for any of the subgroups more often than it does for the rest of the population. Perfect values are:
-
1 if using
'ratio'
asdistance_measure
. -
0 if using
'diff'
asdistance_measure
.
-
- Parameters
-
-
protected_attributes ( pandas.Series , numpy.ndarray , list , str ) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
-
distance_measure (
'ratio'
or'diff'
, optional ) –Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
-
'ratio'
: Uses(subgroup_val / rest_of_pop_val)
. Inverted to always be >= 1 if needed. -
'diff'
: Uses| subgroup_val - rest_of_pop_val |
.
Default is
'diff'
. -
-
reduction (
'max'
,'mean'
orNone
, optional ) –Determines how to reduce scores on all subgroups to a single output. Possible values are:
-
'max'
: Returns the maximal value among all subgroup metrics. -
'mean'
: Returns the mean over all subgroup metrics. -
None
: Returns a{subgroup: subgroup_metric, ...}
dict.
Default is
'mean'
. -
-
Examples
from automl.fairness.metrics import TruePositiveRateScorer scorer = TruePositiveRateScorer(['race', 'sex']) scorer(model, X, y_true)
- __call__ ( model , X , y_true , supplementary_features = None )
-
Computes the metric using a model’s predictions on a given array of instances
X
.- Parameters
-
-
model ( object ) – Object that implements a predict(X) function to collect categorical predictions.
-
X ( pandas.DataFrame ) – Array of instances to compute the metric on.
-
y_true ( pandas.Series , numpy.ndarray , list ) – Array of groundtruth labels.
-
supplementary_features (pandas.DataFrame, optional ) – Array of supplementary features for each instance. Used in case one attribute in
self.protected_attributes
is not contained byX
(e.g. if the protected attribute is not used by the model). Default isNone
, equivalent to empty DataFrame
-
- Returns
-
The computed metric value, with format according to
self.reduction
. - Return type
- Raises
-
AutoMLxValueError –
-
if a feature is present in both
X
andsupplementary_features
.
-
- automl.fairness.metrics.model. true_positive_rate ( y_true , y_pred , subgroups , distance_measure = 'diff' , reduction = 'mean' )
-
Measures the disparity of a model’s true positive rate between subgroups and the rest of the population.
For more details on the computation of this metric, refer to
TruePositiveRateScorer
.- Parameters
-
-
y_true ( pandas.Series , numpy.ndarray , list ) – Array of groundtruth labels.
-
y_pred ( pandas.Series , numpy.ndarray , list ) – Array of model predictions.
-
subgroups ( pandas.DataFrame ) – Dataframe containing protected attributes for each instance.
-
distance_measure (
'ratio'
or'diff'
, optional ) –Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
-
'ratio'
: Uses(subgroup_val / rest_of_pop_val)
. Inverted to always be >= 1 if needed. -
'diff'
: Uses| subgroup_val - rest_of_pop_val |
.
Default is
'diff'
. -
-
reduction (
'max'
,'mean'
orNone
, optional ) –Determines how to reduce scores on all subgroups to a single output. Possible values are:
-
'max'
: Returns the maximal value among all subgroup metrics. -
'mean'
: Returns the mean over all subgroup metrics. -
None
: Returns a{subgroup: subgroup_metric, ...}
dict.
Default is
'mean'
. -
-
- Returns
-
The computed metric value, with format according to reduction .
- Return type
Examples
from automl.fairness.metrics import true_positive_rate subgroups = X[['race', 'sex']] true_positive_rate(y_true, y_pred, subgroups)
False Positive Rate Disparity
- class automl.fairness.metrics.model. FalsePositiveRateScorer ( protected_attributes , distance_measure = 'diff' , reduction = 'mean' )
-
Measures the disparity of a model’s false positive rate between subgroups and the rest of the population.
For each subgroup, the disparity is measured by comparing the false positive rate on instances of a subgroup against the rest of the population.
False Positive Rate (also known as FPR or fall-out) is calculated as FP / (FP + TN), where FP and TN are the number of false positives and true negatives, respectively.
- Perfect score
-
A perfect score for this metric means that the model does not incorrectly predict the positive class for any of the subgroups more often than it does for the rest of the population. Perfect values are:
-
1 if using
'ratio'
asdistance_measure
. -
0 if using
'diff'
asdistance_measure
.
-
- Parameters
-
-
protected_attributes ( pandas.Series , numpy.ndarray , list , str ) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
-
distance_measure (
'ratio'
or'diff'
, optional ) –Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
-
'ratio'
: Uses(subgroup_val / rest_of_pop_val)
. Inverted to always be >= 1 if needed. -
'diff'
: Uses| subgroup_val - rest_of_pop_val |
.
Default is
'diff'
. -
-
reduction (
'max'
,'mean'
orNone
, optional ) –Determines how to reduce scores on all subgroups to a single output. Possible values are:
-
'max'
: Returns the maximal value among all subgroup metrics. -
'mean'
: Returns the mean over all subgroup metrics. -
None
: Returns a{subgroup: subgroup_metric, ...}
dict.
Default is
'mean'
. -
-
Examples
from automl.fairness.metrics import import FalsePositiveRateScorer scorer = FalsePositiveRateScorer(['race', 'sex']) scorer(model, X, y_true)
- __call__ ( model , X , y_true , supplementary_features = None )
-
Computes the metric using a model’s predictions on a given array of instances
X
.- Parameters
-
-
model ( object ) – Object that implements a predict(X) function to collect categorical predictions.
-
X ( pandas.DataFrame ) – Array of instances to compute the metric on.
-
y_true ( pandas.Series , numpy.ndarray , list ) – Array of groundtruth labels.
-
supplementary_features (pandas.DataFrame, optional ) – Array of supplementary features for each instance. Used in case one attribute in
self.protected_attributes
is not contained byX
(e.g. if the protected attribute is not used by the model). Default isNone
, equivalent to empty DataFrame
-
- Returns
-
The computed metric value, with format according to
self.reduction
. - Return type
- Raises
-
AutoMLxValueError –
-
if a feature is present in both
X
andsupplementary_features
.
-
- automl.fairness.metrics.model. false_positive_rate ( y_true , y_pred , subgroups , distance_measure = 'diff' , reduction = 'mean' )
-
Measures the disparity of a model’s false positive rate between subgroups and the rest of the population.
For more details on the computation of this metric, refer to
FalsePositiveRateScorer
.- Parameters
-
-
y_true ( pandas.Series , numpy.ndarray , list ) – Array of groundtruth labels.
-
y_pred ( pandas.Series , numpy.ndarray , list ) – Array of model predictions.
-
subgroups ( pandas.DataFrame ) – Dataframe containing protected attributes for each instance.
-
distance_measure (
'ratio'
or'diff'
, optional ) –Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
-
'ratio'
: Uses(subgroup_val / rest_of_pop_val)
. Inverted to always be >= 1 if needed. -
'diff'
: Uses| subgroup_val - rest_of_pop_val |
.
Default is
'diff'
. -
-
reduction (
'max'
,'mean'
orNone
, optional ) –Determines how to reduce scores on all subgroups to a single output. Possible values are:
-
'max'
: Returns the maximal value among all subgroup metrics. -
'mean'
: Returns the mean over all subgroup metrics. -
None
: Returns a{subgroup: subgroup_metric, ...}
dict.
Default is
'mean'
. -
-
- Returns
-
The computed metric value, with format according to reduction .
- Return type
Examples
from automl.fairness.metrics import import false_positive_rate subgroups = X[['race', 'sex']] false_positive_rate(y_true, y_pred, subgroups)
False Negative Rate Disparity
- class automl.fairness.metrics.model. FalseNegativeRateScorer ( protected_attributes , distance_measure = 'diff' , reduction = 'mean' )
-
Measures the disparity of a model’s false negative rate between subgroups and the rest of the population.
For each subgroup, the disparity is measured by comparing the false negative rate on instances of a subgroup against the rest of the population.
False Negative Rate (also known as FNR or miss rate) is calculated as FN / (FN + TP), where FN and TP are the number of false negatives and true positives, respectively.
- Perfect score
-
A perfect score for this metric means that the model does not incorrectly predict the negative class for any of the subgroups more often than it does for the rest of the population. Perfect values are:
-
1 if using
'ratio'
asdistance_measure
. -
0 if using
'diff'
asdistance_measure
.
-
- Parameters
-
-
protected_attributes ( pandas.Series , numpy.ndarray , list , str ) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
-
distance_measure (
'ratio'
or'diff'
, optional ) –Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
-
'ratio'
: Uses(subgroup_val / rest_of_pop_val)
. Inverted to always be >= 1 if needed. -
'diff'
: Uses| subgroup_val - rest_of_pop_val |
.
Default is
'diff'
. -
-
reduction (
'max'
,'mean'
orNone
, optional ) –Determines how to reduce scores on all subgroups to a single output. Possible values are:
-
'max'
: Returns the maximal value among all subgroup metrics. -
'mean'
: Returns the mean over all subgroup metrics. -
None
: Returns a{subgroup: subgroup_metric, ...}
dict.
Default is
'mean'
. -
-
Examples
from automl.fairness.metrics import import FalseNegativeRateScorer scorer = FalseNegativeRateScorer(['race', 'sex']) scorer(model, X, y_true)
- __call__ ( model , X , y_true , supplementary_features = None )
-
Computes the metric using a model’s predictions on a given array of instances
X
.- Parameters
-
-
model ( object ) – Object that implements a predict(X) function to collect categorical predictions.
-
X ( pandas.DataFrame ) – Array of instances to compute the metric on.
-
y_true ( pandas.Series , numpy.ndarray , list ) – Array of groundtruth labels.
-
supplementary_features (pandas.DataFrame, optional ) – Array of supplementary features for each instance. Used in case one attribute in
self.protected_attributes
is not contained byX
(e.g. if the protected attribute is not used by the model). Default isNone
, equivalent to empty DataFrame
-
- Returns
-
The computed metric value, with format according to
self.reduction
. - Return type
- Raises
-
AutoMLxValueError –
-
if a feature is present in both
X
andsupplementary_features
.
-
- automl.fairness.metrics.model. false_negative_rate ( y_true , y_pred , subgroups , distance_measure = 'diff' , reduction = 'mean' )
-
Measures the disparity of a model’s false negative rate between subgroups and the rest of the population.
For more details on the computation of this metric, refer to
FalseNegativeRateScorer
.- Parameters
-
-
y_true ( pandas.Series , numpy.ndarray , list ) – Array of groundtruth labels.
-
y_pred ( pandas.Series , numpy.ndarray , list ) – Array of model predictions.
-
subgroups ( pandas.DataFrame ) – Dataframe containing protected attributes for each instance.
-
distance_measure (
'ratio'
or'diff'
, optional ) –Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
-
'ratio'
: Uses(subgroup_val / rest_of_pop_val)
. Inverted to always be >= 1 if needed. -
'diff'
: Uses| subgroup_val - rest_of_pop_val |
.
Default is
'diff'
. -
-
reduction (
'max'
,'mean'
orNone
, optional ) –Determines how to reduce scores on all subgroups to a single output. Possible values are:
-
'max'
: Returns the maximal value among all subgroup metrics. -
'mean'
: Returns the mean over all subgroup metrics. -
None
: Returns a{subgroup: subgroup_metric, ...}
dict.
Default is
'mean'
. -
-
- Returns
-
The computed metric value, with format according to reduction .
- Return type
Examples
from automl.fairness.metrics import import false_negative_rate subgroups = X[['race', 'sex']] false_negative_rate(y_true, y_pred, subgroups)
False Omission Rate Disparity
- class automl.fairness.metrics.model. FalseOmissionRateScorer ( protected_attributes , distance_measure = 'diff' , reduction = 'mean' )
-
Measures the disparity of a model’s false omission rate between subgroups and the rest of the population.
For each subgroup, the disparity is measured by comparing the false omission rate on instances of a subgroup against the rest of the population.
False Omission Rate (also known as FOR) is calculated as FN / (FN + TN), where FN and TN are the number of false negatives and true negatives, respectively.
- Perfect score
-
A perfect score for this metric means that the model does not make more mistakes on the negative class for any of the subgroups more often than it does for the rest of the population. Perfect values are:
-
1 if using
'ratio'
asdistance_measure
. -
0 if using
'diff'
asdistance_measure
.
-
- Parameters
-
-
protected_attributes ( pandas.Series , numpy.ndarray , list , str ) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
-
distance_measure (
'ratio'
or'diff'
, optional ) –Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
-
'ratio'
: Uses(subgroup_val / rest_of_pop_val)
. Inverted to always be >= 1 if needed. -
'diff'
: Uses| subgroup_val - rest_of_pop_val |
.
Default is
'diff'
. -
-
reduction (
'max'
,'mean'
orNone
, optional ) –Determines how to reduce scores on all subgroups to a single output. Possible values are:
-
'max'
: Returns the maximal value among all subgroup metrics. -
'mean'
: Returns the mean over all subgroup metrics. -
None
: Returns a{subgroup: subgroup_metric, ...}
dict.
Default is
'mean'
. -
-
Examples
from automl.fairness.metrics import import FalseOmissionRateScorer scorer = FalseOmissionRateScorer(['race', 'sex']) scorer(model, X, y_true)
- __call__ ( model , X , y_true , supplementary_features = None )
-
Computes the metric using a model’s predictions on a given array of instances
X
.- Parameters
-
-
model ( object ) – Object that implements a predict(X) function to collect categorical predictions.
-
X ( pandas.DataFrame ) – Array of instances to compute the metric on.
-
y_true ( pandas.Series , numpy.ndarray , list ) – Array of groundtruth labels.
-
supplementary_features (pandas.DataFrame, optional ) – Array of supplementary features for each instance. Used in case one attribute in
self.protected_attributes
is not contained byX
(e.g. if the protected attribute is not used by the model). Default isNone
, equivalent to empty DataFrame
-
- Returns
-
The computed metric value, with format according to
self.reduction
. - Return type
- Raises
-
AutoMLxValueError –
-
if a feature is present in both
X
andsupplementary_features
.
-
- automl.fairness.metrics.model. false_omission_rate ( y_true , y_pred , subgroups , distance_measure = 'diff' , reduction = 'mean' )
-
Measures the disparity of a model’s false omission rate between subgroups and the rest of the population.
For more details on the computation of this metric, refer to
FalseOmissionRateScorer
.- Parameters
-
-
y_true ( pandas.Series , numpy.ndarray , list ) – Array of groundtruth labels.
-
y_pred ( pandas.Series , numpy.ndarray , list ) – Array of model predictions.
-
subgroups ( pandas.DataFrame ) – Dataframe containing protected attributes for each instance.
-
distance_measure (
'ratio'
or'diff'
, optional ) –Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
-
'ratio'
: Uses(subgroup_val / rest_of_pop_val)
. Inverted to always be >= 1 if needed. -
'diff'
: Uses| subgroup_val - rest_of_pop_val |
.
Default is
'diff'
. -
-
reduction (
'max'
,'mean'
orNone
, optional ) –Determines how to reduce scores on all subgroups to a single output. Possible values are:
-
'max'
: Returns the maximal value among all subgroup metrics. -
'mean'
: Returns the mean over all subgroup metrics. -
None
: Returns a{subgroup: subgroup_metric, ...}
dict.
Default is
'mean'
. -
-
- Returns
-
The computed metric value, with format according to reduction .
- Return type
Examples
from automl.fairness.metrics import import false_omission_rate subgroups = X[['race', 'sex']] false_omission_rate(y_true, y_pred, subgroups)
False Discovery Rate Disparity
- class automl.fairness.metrics.model. FalseDiscoveryRateScorer ( protected_attributes , distance_measure = 'diff' , reduction = 'mean' )
-
Measures the disparity of a model’s false discovery rate between subgroups and the rest of the population.
For each subgroup, the disparity is measured by comparing the false discovery rate on instances of a subgroup against the rest of the population.
False Discovery Rate (also known as FDR) is calculated as FP / (FP + TP), where FP and TP are the number of false positives and true positives, respectively.
- Perfect score
-
A perfect score for this metric means that the model does not make more mistakes on the positive class for any of the subgroups more often than it does for the rest of the population. Perfect values are:
-
1 if using
'ratio'
asdistance_measure
. -
0 if using
'diff'
asdistance_measure
.
-
- Parameters
-
-
protected_attributes ( pandas.Series , numpy.ndarray , list , str ) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
-
distance_measure (
'ratio'
or'diff'
, optional ) –Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
-
'ratio'
: Uses(subgroup_val / rest_of_pop_val)
. Inverted to always be >= 1 if needed. -
'diff'
: Uses| subgroup_val - rest_of_pop_val |
.
Default is
'diff'
. -
-
reduction (
'max'
,'mean'
orNone
, optional ) –Determines how to reduce scores on all subgroups to a single output. Possible values are:
-
'max'
: Returns the maximal value among all subgroup metrics. -
'mean'
: Returns the mean over all subgroup metrics. -
None
: Returns a{subgroup: subgroup_metric, ...}
dict.
Default is
'mean'
. -
-
Examples
from automl.fairness.metrics import import FalseDiscoveryRateScorer scorer = FalseDiscoveryRateScorer(['race', 'sex']) scorer(model, X, y_true)
- __call__ ( model , X , y_true , supplementary_features = None )
-
Computes the metric using a model’s predictions on a given array of instances
X
.- Parameters
-
-
model ( object ) – Object that implements a predict(X) function to collect categorical predictions.
-
X ( pandas.DataFrame ) – Array of instances to compute the metric on.
-
y_true ( pandas.Series , numpy.ndarray , list ) – Array of groundtruth labels.
-
supplementary_features (pandas.DataFrame, optional ) – Array of supplementary features for each instance. Used in case one attribute in
self.protected_attributes
is not contained byX
(e.g. if the protected attribute is not used by the model). Default isNone
, equivalent to empty DataFrame
-
- Returns
-
The computed metric value, with format according to
self.reduction
. - Return type
- Raises
-
AutoMLxValueError –
-
if a feature is present in both
X
andsupplementary_features
.
-
- automl.fairness.metrics.model. false_discovery_rate ( y_true , y_pred , subgroups , distance_measure = 'diff' , reduction = 'mean' )
-
Measures the disparity of a model’s false discovery rate between subgroups and the rest of the population.
For more details on the computation of this metric, refer to
FalseDiscoveryRateScorer
.- Parameters
-
-
y_true ( pandas.Series , numpy.ndarray , list ) – Array of groundtruth labels.
-
y_pred ( pandas.Series , numpy.ndarray , list ) – Array of model predictions.
-
subgroups ( pandas.DataFrame ) – Dataframe containing protected attributes for each instance.
-
distance_measure (
'ratio'
or'diff'
, optional ) –Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
-
'ratio'
: Uses(subgroup_val / rest_of_pop_val)
. Inverted to always be >= 1 if needed. -
'diff'
: Uses| subgroup_val - rest_of_pop_val |
.
Default is
'diff'
. -
-
reduction (
'max'
,'mean'
orNone
, optional ) –Determines how to reduce scores on all subgroups to a single output. Possible values are:
-
'max'
: Returns the maximal value among all subgroup metrics. -
'mean'
: Returns the mean over all subgroup metrics. -
None
: Returns a{subgroup: subgroup_metric, ...}
dict.
Default is
'mean'
. -
-
- Returns
-
The computed metric value, with format according to reduction .
- Return type
Examples
from automl.fairness.metrics import import false_discovery_rate subgroups = X[['race', 'sex']] false_omission_rate(y_true, y_pred, subgroups)
Error Rate Disparity
- class automl.fairness.metrics.model. ErrorRateScorer ( protected_attributes , distance_measure = 'diff' , reduction = 'mean' )
-
Measures the disparity of a model’s error rate between subgroups and the rest of the population.
For each subgroup, the disparity is measured by comparing the error rate on instances of a subgroup against the rest of the population.
Error Rate (also known as inaccuracy) is calculated as (FP + FN) / N, where FP and FN are the number of false positives and false negatives, respectively, while N is the total Number of instances.
- Perfect score
-
A perfect score for this metric means that the model does not make more mistakes for any of the subgroups more often than it does for the rest of the population. Perfect values are:
-
1 if using
'ratio'
asdistance_measure
. -
0 if using
'diff'
asdistance_measure
.
-
- Parameters
-
-
protected_attributes ( pandas.Series , numpy.ndarray , list , str ) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
-
distance_measure (
'ratio'
or'diff'
, optional ) –Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
-
'ratio'
: Uses(subgroup_val / rest_of_pop_val)
. Inverted to always be >= 1 if needed. -
'diff'
: Uses| subgroup_val - rest_of_pop_val |
.
Default is
'diff'
. -
-
reduction (
'max'
,'mean'
orNone
, optional ) –Determines how to reduce scores on all subgroups to a single output. Possible values are:
-
'max'
: Returns the maximal value among all subgroup metrics. -
'mean'
: Returns the mean over all subgroup metrics. -
None
: Returns a{subgroup: subgroup_metric, ...}
dict.
Default is
'mean'
. -
-
Examples
from automl.fairness.metrics import import ErrorRateScorer scorer = ErrorRateScorer(['race', 'sex']) scorer(model, X, y_true)
- __call__ ( model , X , y_true , supplementary_features = None )
-
Computes the metric using a model’s predictions on a given array of instances
X
.- Parameters
-
-
model ( object ) – Object that implements a predict(X) function to collect categorical predictions.
-
X ( pandas.DataFrame ) – Array of instances to compute the metric on.
-
y_true ( pandas.Series , numpy.ndarray , list ) – Array of groundtruth labels.
-
supplementary_features (pandas.DataFrame, optional ) – Array of supplementary features for each instance. Used in case one attribute in
self.protected_attributes
is not contained byX
(e.g. if the protected attribute is not used by the model). Default isNone
, equivalent to empty DataFrame
-
- Returns
-
The computed metric value, with format according to
self.reduction
. - Return type
- Raises
-
AutoMLxValueError –
-
if a feature is present in both
X
andsupplementary_features
.
-
- automl.fairness.metrics.model. error_rate ( y_true , y_pred , subgroups , distance_measure = 'diff' , reduction = 'mean' )
-
Measures the disparity of a model’s error rate between subgroups and the rest of the population.
For more details on the computation of this metric, refer to
ErrorRateScorer
.- Parameters
-
-
y_true ( pandas.Series , numpy.ndarray , list ) – Array of groundtruth labels.
-
y_pred ( pandas.Series , numpy.ndarray , list ) – Array of model predictions.
-
subgroups ( pandas.DataFrame ) – Dataframe containing protected attributes for each instance.
-
distance_measure (
'ratio'
or'diff'
, optional ) –Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
-
'ratio'
: Uses(subgroup_val / rest_of_pop_val)
. Inverted to always be >= 1 if needed. -
'diff'
: Uses| subgroup_val - rest_of_pop_val |
.
Default is
'diff'
. -
-
reduction (
'max'
,'mean'
orNone
, optional ) –Determines how to reduce scores on all subgroups to a single output. Possible values are:
-
'max'
: Returns the maximal value among all subgroup metrics. -
'mean'
: Returns the mean over all subgroup metrics. -
None
: Returns a{subgroup: subgroup_metric, ...}
dict.
Default is
'mean'
. -
-
- Returns
-
The computed metric value, with format according to reduction .
- Return type
Examples
from automl.fairness.metrics import import error_rate subgroups = X[['race', 'sex']] false_omission_rate(y_true, y_pred, subgroups)
Equalized Odds
- class automl.fairness.metrics.model. EqualizedOddsScorer ( protected_attributes , distance_measure = 'diff' , reduction = 'mean' )
-
Measures the disparity of a model’s true positive and false positive rates between subgroups and the rest of the population.
The disparity is measured by comparing the true positive and false positive rates on instances of a subgroup against the rest of the population.
True Positive Rate (also known as TPR, recall, or sensitivity) is calculated as TP / (TP + FN), where TP and FN are the number of true positives and false negatives, respectively.
False Positive Rate (also known as FPR or fall-out) is calculated as FP / (FP + TN), where FP and TN are the number of false positives and true negatives, respectively.
Equalized Odds is computed by taking the maximum distance between TPR and FPR for a subgroup against the rest of the population.
- Perfect score
-
A perfect score for this metric means that the model has the same TPR and FPR when comparing a subgroup to the rest of the population. Perfect values are:
-
1 if using
'ratio'
asdistance_measure
. -
0 if using
'diff'
asdistance_measure
.
-
- Parameters
-
-
protected_attributes ( pandas.Series , numpy.ndarray , list , str ) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
-
distance_measure (
'ratio'
or'diff'
, optional ) –Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
-
'ratio'
: Uses(subgroup_val / rest_of_pop_val)
. Inverted to always be >= 1 if needed. -
'diff'
: Uses| subgroup_val - rest_of_pop_val |
.
Default is
'diff'
. -
-
reduction (
'max'
,'mean'
orNone
, optional ) –Determines how to reduce scores on all subgroups to a single output. Possible values are:
-
'max'
: Returns the maximal value among all subgroup metrics. -
'mean'
: Returns the mean over all subgroup metrics. -
None
: Returns a{subgroup: subgroup_metric, ...}
dict.
Default is
'mean'
. -
-
Examples
from automl.fairness.metrics import import EqualizedOddsScorer scorer = EqualizedOddsScorer(['race', 'sex']) scorer(model, X, y_true)
- __call__ ( model , X , y_true , supplementary_features = None )
-
Computes the metric using a model’s predictions on a given array of instances
X
.- Parameters
-
-
model ( object ) – Object that implements a predict(X) function to collect categorical predictions.
-
X ( pandas.DataFrame ) – Array of instances to compute the metric on.
-
y_true ( pandas.Series , numpy.ndarray , list ) – Array of groundtruth labels.
-
supplementary_features (pandas.DataFrame, optional ) – Array of supplementary features for each instance. Used in case one attribute in
self.protected_attributes
is not contained byX
(e.g. if the protected attribute is not used by the model). Default isNone
, equivalent to empty DataFrame
-
- Returns
-
The computed metric value, with format according to
self.reduction
. - Return type
- Raises
-
AutoMLxValueError –
-
if a feature is present in both
X
andsupplementary_features
.
-
- automl.fairness.metrics.model. equalized_odds ( y_true , y_pred , subgroups , distance_measure = 'diff' , reduction = 'mean' )
-
Measures the disparity of a model’s true positive and false positive rates between subgroups and the rest of the population.
For more details on the computation of this metric, refer to
EqualizedOddsScorer
.- Parameters
-
-
y_true ( pandas.Series , numpy.ndarray , list ) – Array of groundtruth labels.
-
y_pred ( pandas.Series , numpy.ndarray , list ) – Array of model predictions.
-
subgroups ( pandas.DataFrame ) – Dataframe containing protected attributes for each instance.
-
distance_measure (
'ratio'
or'diff'
, optional ) –Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
-
'ratio'
: Uses(subgroup_val / rest_of_pop_val)
. Inverted to always be >= 1 if needed. -
'diff'
: Uses| subgroup_val - rest_of_pop_val |
.
Default is
'diff'
. -
-
reduction (
'max'
,'mean'
orNone
, optional ) –Determines how to reduce scores on all subgroups to a single output. Possible values are:
-
'max'
: Returns the maximal value among all subgroup metrics. -
'mean'
: Returns the mean over all subgroup metrics. -
None
: Returns a{subgroup: subgroup_metric, ...}
dict.
Default is
'mean'
. -
-
- Returns
-
The computed metric value, with format according to reduction .
- Return type
Examples
from automl.fairness.metrics import import equalized_odds subgroups = X[['race', 'sex']] equalized_odds(y_true, y_pred, subgroups)
Theil Index
- class automl.fairness.metrics.model. TheilIndexScorer ( protected_attributes , distance_measure = None , reduction = 'mean' )
-
Measures the disparity of a model’s predictions according to groundtruth labels, as proposed by Speicher et al. [1].
Intuitively, the Theil Index can be thought of as a measure of the divergence between a subgroup’s different error distributions (i.e. false positives and false negatives) against the rest of the population.
- Perfect score
-
The perfect score for this metric is 0, meaning that the model does not have a different error distribution for any subgroup when compared to the rest of the population.
- Parameters
-
-
protected_attributes ( pandas.Series , numpy.ndarray , list , str ) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
-
reduction (
'max'
,'mean'
orNone
, optional ) –Determines how to reduce scores on all subgroups to a single output. Possible values are:
-
'max'
: Returns the maximal value among all subgroup metrics. -
'mean'
: Returns the mean over all subgroup metrics. -
None
: Returns a{subgroup: subgroup_metric, ...}
dict.
Default is
'mean'
. -
-
References
- [1]: Speicher, Till, et al. “A unified approach to quantifying algorithmic
-
unfairness: Measuring individual & group unfairness via inequality indices.” Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018.
Examples
from automl.fairness.metrics import import TheilIndexScorer scorer = TheilIndexScorer(['race', 'sex']) scorer(model, X, y_true)
- __call__ ( model , X , y_true , supplementary_features = None )
-
Computes the metric using a model’s predictions on a given array of instances
X
.- Parameters
-
-
model ( object ) – Object that implements a predict(X) function to collect categorical predictions.
-
X ( pandas.DataFrame ) – Array of instances to compute the metric on.
-
y_true ( pandas.Series , numpy.ndarray , list ) – Array of groundtruth labels.
-
supplementary_features (pandas.DataFrame, optional ) – Array of supplementary features for each instance. Used in case one attribute in
self.protected_attributes
is not contained byX
(e.g. if the protected attribute is not used by the model). Default isNone
, equivalent to empty DataFrame
-
- Returns
-
The computed metric value, with format according to
self.reduction
. - Return type
- Raises
-
AutoMLxValueError –
-
if a feature is present in both
X
andsupplementary_features
.
-
- automl.fairness.metrics.model. theil_index ( y_true , y_pred , subgroups , distance_measure = None , reduction = 'mean' )
-
Measures the disparity of a model’s predictions according to groundtruth labels, as proposed by Speicher et al. [1].
For more details on the computation of this metric, refer to
TheilIndexScorer
.- Parameters
-
-
y_true ( pandas.Series , numpy.ndarray , list ) – Array of groundtruth labels.
-
y_pred ( pandas.Series , numpy.ndarray , list ) – Array of model predictions.
-
subgroups ( pandas.DataFrame ) – Dataframe containing protected attributes for each instance.
-
reduction (
'max'
,'mean'
orNone
, optional ) –Determines how to reduce scores on all subgroups to a single output. Possible values are:
-
'max'
: Returns the maximal value among all subgroup metrics. -
'mean'
: Returns the mean over all subgroup metrics. -
None
: Returns a{subgroup: subgroup_metric, ...}
dict.
Default is
'mean'
. -
-
- Returns
-
The computed metric value, with format according to reduction .
- Return type
References
- [1]: Speicher, Till, et al. “A unified approach to quantifying algorithmic
-
unfairness: Measuring individual & group unfairness via inequality indices.” Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018.
Examples
from automl.fairness.metrics import import theil_index subgroups = X[['race', 'sex']] theil_index(y_true, y_pred, subgroups)
Evaluating a Dataset
Statistical Parity
- class automl.fairness.metrics.dataset. DatasetStatisticalParityScorer ( protected_attributes , distance_measure = 'diff' , reduction = 'mean' )
-
Measures the statistical parity of a dataset between subgroups and the rest of the population.
For each subgroup, statistical parity is computed as the ratio of positive labels in a subgroup.
Statistical Parity (also known as Base Rate or Disparate Impact) is calculated as PL / N, where PL and N are the number of Positive Labels and total number of instances, respectively.
- Perfect score
-
A perfect score for this metric means that the dataset does not have a different ratio of positive labels for a subgroup than it does for the rest of the population. Perfect values are:
-
1 if using
'ratio'
asdistance_measure
. -
0 if using
'diff'
asdistance_measure
.
-
- Parameters
-
-
protected_attributes ( pandas.Series , numpy.ndarray , list , str ) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
-
distance_measure (
'ratio'
or'diff'
, optional ) –Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
-
'ratio'
: Uses(subgroup_val / rest_of_pop_val)
. Inverted to always be >= 1 if needed. -
'diff'
: Uses| subgroup_val - rest_of_pop_val |
.
Default is
'diff'
. -
-
reduction (
'max'
,'mean'
orNone
, optional ) –Determines how to reduce scores on all subgroups to a single output. Possible values are:
-
'max'
: Returns the maximal value among all subgroup metrics. -
'mean'
: Returns the mean over all subgroup metrics. -
None
: Returns a{subgroup: subgroup_metric, ...}
dict.
Default is
'mean'
. -
-
Examples
from automl.fairness.metrics import DatasetStatisticalParityScorer scorer = DatasetStatisticalParityScorer(['race', 'sex']) scorer(X=X, y_true=y_true) scorer(None, X, y_true)
- __call__ ( model = None , X = None , y_true = None , supplementary_features = None )
-
Compute the metric on a given array of instances
X
.- Parameters
-
-
model ( object ) – Object that implements a predict(X) function to collect categorical predictions.
-
X ( pandas.DataFrame. ) – Array of instances to compute the metric on.
-
y_true ( pandas.Series , numpy.ndarray , list ) – Array of groundtruth labels.
-
supplementary_features (pandas.DataFrame, optional ) – Array of supplementary features for each instance. Used in case one attribute in
self.protected_attributes
is not contained byX
(e.g. if the protected attribute is not used by the model). Raise an AutoMLxValueError if a feature is present in bothX
andsupplementary_features
. Default isNone
, equivalent to empty DataFrame
-
- Returns
-
The computed metric value, with format according to
self.reduction
. - Return type
- automl.fairness.metrics.dataset. dataset_statistical_parity ( y_true , subgroups , distance_measure = 'diff' , reduction = 'mean' )
-
Measures the statistical parity of a dataset between subgroups and the rest of the population.
For more details on the computation of this metric, refer to
DatasetStatisticalParityScorer
.- Parameters
-
-
y_true (pandas.Series, numpy.ndarray, list, optional ) – Array of groundtruth labels
-
subgroups ( pandas.DataFrame ) – Dataframe containing protected attributes for each instance.
-
distance_measure (
'ratio'
or'diff'
, optional ) –Determines the distance used to compare a subgroup’s metric against the rest of the population. Possible values are:
-
'ratio'
: Uses(subgroup_val / rest_of_pop_val)
. Inverted to always be >= 1 if needed. -
'diff'
: Uses| subgroup_val - rest_of_pop_val |
.
Default is
'diff'
. -
-
reduction (
'max'
,'mean'
orNone
, optional ) –Determines how to reduce scores on all subgroups to a single output. Possible values are:
-
'max'
: Returns the maximal value among all subgroup metrics. -
'mean'
: Returns the mean over all subgroup metrics. -
None
: Returns a{subgroup: subgroup_metric, ...}
dict.
Default is
'mean'
. -
-
Examples
from automl.fairness.metrics import dataset_statistical_parity subgroups = X[['race', 'sex']] dataset_statistical_parity(y_true, subgroups)
Consistency
- class automl.fairness.metrics.dataset. ConsistencyScorer ( protected_attributes )
-
Measures the consistency of a dataset.
Consistency is measured as the number of ratio of instances that have a different label from the k=5 nearest neighbors.
- Perfect score
-
A perfect score for this metric is 0, meaning that the dataset does not have different labels for instances that are similar to one another.
- Parameters
-
protected_attributes ( pandas.Series , numpy.ndarray , list , str ) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
Examples
from automl.fairness.metrics import ConsistencyScorer scorer = ConsistencyScorer(['race', 'sex']) scorer(X=X, y_true=y_true) scorer(None, X, y_true)
- __call__ ( model = None , X = None , y_true = None , supplementary_features = None )
-
Call self as a function.
- automl.fairness.metrics.dataset. consistency ( y_true , subgroups )
-
Measures the consistency of a dataset.
For more details on the computation of this metric, refer to
ConsistencyScorer
.- Parameters
-
-
y_true (pandas.Series, numpy.ndarray, list, optional ) – Array of groundtruth labels
-
subgroups ( pandas.DataFrame ) – Dataframe containing protected attributes for each instance.
-
Examples
from automl.fairness.metrics import consistency subgroups = X[['race', 'sex']] consistency(y_true, subgroups)
Smoothed EDF
- class automl.fairness.metrics.dataset. SmoothedEDFScorer ( protected_attributes )
-
Measures the smoothed Empirical Differential Fairness (EDF) of a dataset, as proposed by Foulds et al. [1].
Smoothed EDF returns the minimal exponential deviation of positive target ratios comparing a subgroup to the rest of the population.
This metric is related to
DatasetStatisticalParity
with reduction=’max’ and distance_measure=’ratio’ , with the only difference being thatSmoothedEDFScorer
returns a logarithmic value instead.- Perfect score
-
A perfect score for this metric is 0, meaning that the dataset does not have a different ratio of positive labels for a subgroup than it does for the rest of the population.
- Parameters
-
protected_attributes ( pandas.Series , numpy.ndarray , list , str ) – Array of attributes or single attribute that should be treated as protected. If an attribute is protected, then all of its unique values are considered as subgroups.
References
- [1] Foulds, James R., et al. “An intersectional definition of fairness.”
-
2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 2020.
Examples
from automl.fairness.metrics import SmoothedEDFScorer scorer = SmoothedEDFScorer(['race', 'sex']) scorer(X=X, y_true=y_true) scorer(None, X, y_true)
- __call__ ( model = None , X = None , y_true = None , supplementary_features = None )
-
Call self as a function.
- automl.fairness.metrics.dataset. smoothed_edf ( y_true , subgroups )
-
Measures the smoothed Empirical Differential Fairness (EDF) of a dataset, as proposed by Foulds et al. [1].
For more details on the computation of this metric, refer to
SmoothedEDFScorer
.- Parameters
-
-
y_true (pandas.Series, numpy.ndarray, list, optional ) – Array of groundtruth labels
-
subgroups ( pandas.DataFrame ) – Dataframe containing protected attributes for each instance.
-
References
- [1] Foulds, James R., et al. “An intersectional definition of fairness.”
-
2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 2020.
Examples
from automl.fairness.metrics import smoothed_edf subgroups = X[['race', 'sex']] smoothed_edf(y_true, subgroups)