Constraints for reductions methods like ExponentiatedGradient
are implemented at https://github.com/fairlearn/fairlearn/tree/master/fairlearn/reductions/_moments
False Positive Rate Parity
The repository already contains True Positive Rate Parity at https://github.com/fairlearn/fairlearn/blob/e284727233fc6eb341d99202d3f4f4f8ff046b22/fairlearn/reductions/_moments/conditional_selection_rate.py#L194
The way constraints are implemented is very general, so we can actually create a lot of different constraints based on the ConditionalSelectionRate
base class. By default any subclass will try to enforce parity between groups as defined by sensitive_features
. You can, however, specify further partitions using the event
argument. For the demographic parity constraint that's not required, so it just says
super().load_data(X, y, event=_ALL, **kwargs)
where _ALL
is just a string. That means every sample gets the same event and we don't partition further.
Equalized Odds, on the other hand, needs to break down further by label, so it says
super().load_data(X, y,
event=pd.Series(y).apply(lambda y: _LABEL + "=" + str(y)),
**kwargs)
This means in addition to grouping by sensitive_features
value, we also group within each sensitive feature group by label. Our constraints will target parity between the sensitive features groups of the same label.
If you're familiar with Equalized Odds you know that it is equal to True Positive Rate Parity and False Positive Rate Parity, so for False Positive Rate Parity we need something similar, although with fewer constraints.
True Positive Rate Difference is already implemented as
super().load_data(X, y,
event=pd.Series(y).apply(lambda y: _LABEL + "=" + str(y)).where(y == 1),
**kwargs)
This is the same as for Equalized Odds, but we add a where
clause for labels that are 1 only. Others are ignored for the constraint. That's just what we want for FPR, so just use
super().load_data(X, y,
event=pd.Series(y).apply(lambda y: _LABEL + "=" + str(y)).where(y == 0),
**kwargs)
Done!
Other subgroup parity constraints
EqualizedOdds already provides a great example of how to achieve this. Instead of setting the event based on the label we can also use other features, e.g.
# assuming X as DataFrame and with age column
age_buckets = X['age'].apply(lambda age: 0 if age < 35 else 1 if age < 55 else 2)
super().load_data(X, y,
event=age_buckets.apply(lambda bucket: "age_bucket=" + str(bucket)),
**kwargs)
Note that this will try to achieve parity between all groups with bucket 0, but not across buckets. Same for groups with bucket 1, and for groups with bucket 2.
Any generally useful constraints should definitely be contributed back to the repository!