0

Fairlearn currently provides Demographic Parity, Equalized Odds, True Positive Parity as fairness constraints for the ExponentiatedGradient unfairness mitigation technique. If I want to use a custom fairness constraint, is that at all possible? If so, how would I write my constraint?

Some constraints I'd be interested in are:

  • False Positive Parity
  • Parity between certain subgroups only, e.g. parity between male/female/non-binary gender within the age bucket <35 (but not necessarily parity with subgroups from other buckets); similarly for age bucket 35-55, etc.

Any ideas, hints, or pointers to documentation would be useful!

adrin
  • 4,511
  • 3
  • 34
  • 50
Roman Lutz
  • 48
  • 7
  • Welcome to SO, which is about *specific coding* issues and not about ideas or hints; recommendations for external resources is also explicitly off-topic. Please do go through the help system first. – desertnaut Apr 24 '20 at 10:20

1 Answers1

0

Constraints for reductions methods like ExponentiatedGradient are implemented at https://github.com/fairlearn/fairlearn/tree/master/fairlearn/reductions/_moments

False Positive Rate Parity

The repository already contains True Positive Rate Parity at https://github.com/fairlearn/fairlearn/blob/e284727233fc6eb341d99202d3f4f4f8ff046b22/fairlearn/reductions/_moments/conditional_selection_rate.py#L194

The way constraints are implemented is very general, so we can actually create a lot of different constraints based on the ConditionalSelectionRate base class. By default any subclass will try to enforce parity between groups as defined by sensitive_features. You can, however, specify further partitions using the event argument. For the demographic parity constraint that's not required, so it just says

super().load_data(X, y, event=_ALL, **kwargs)

where _ALL is just a string. That means every sample gets the same event and we don't partition further. Equalized Odds, on the other hand, needs to break down further by label, so it says

super().load_data(X, y,
                  event=pd.Series(y).apply(lambda y: _LABEL + "=" + str(y)),
                  **kwargs)

This means in addition to grouping by sensitive_features value, we also group within each sensitive feature group by label. Our constraints will target parity between the sensitive features groups of the same label. If you're familiar with Equalized Odds you know that it is equal to True Positive Rate Parity and False Positive Rate Parity, so for False Positive Rate Parity we need something similar, although with fewer constraints. True Positive Rate Difference is already implemented as

super().load_data(X, y,
                  event=pd.Series(y).apply(lambda y: _LABEL + "=" + str(y)).where(y == 1),
                  **kwargs)

This is the same as for Equalized Odds, but we add a where clause for labels that are 1 only. Others are ignored for the constraint. That's just what we want for FPR, so just use

super().load_data(X, y,
                  event=pd.Series(y).apply(lambda y: _LABEL + "=" + str(y)).where(y == 0),
                  **kwargs)

Done!

Other subgroup parity constraints

EqualizedOdds already provides a great example of how to achieve this. Instead of setting the event based on the label we can also use other features, e.g.

# assuming X as DataFrame and with age column
age_buckets = X['age'].apply(lambda age: 0 if age < 35 else 1 if age < 55 else 2)
super().load_data(X, y,
                  event=age_buckets.apply(lambda bucket: "age_bucket=" + str(bucket)),
                  **kwargs)

Note that this will try to achieve parity between all groups with bucket 0, but not across buckets. Same for groups with bucket 1, and for groups with bucket 2.

Any generally useful constraints should definitely be contributed back to the repository!

Roman Lutz
  • 48
  • 7