0

I'm using the AI Fairness 360 package to get fairness metrics on a dataset. I've already converted the data to a StandardDataset instance. If I understand correctly, this will change all values of protected attributes to 1 or 0: 1 meaning "belongs the a privileged group for this attribute", and 0 meaning "belongs to an unprivileged group for this attribute".

When calculating fairness metrics, I need to create a BinaryLabelDatasetMetric instance for which I need to specify which combinations of protected attributes I consider my privileged/unprivileged groups. But why do I need to provide the attribute values that are privileged/unprivileged? After converting to a StandardDataset all privileged values are 1 and unprivileged are 0. Am I missing something? Because if not, just coding it as always 1 is much easier.

So in summary, my question is: can the values for protected attributes in a StandardDataset ever be anything other than 1 or 0? If yes, in what case? (If no, it seems the API could be simplified a lot, by just requiring the names of the protected attributes and not the values.)

Willem
  • 976
  • 9
  • 24

1 Answers1

0

can the values for protected attributes in a StandardDataset ever be anything other than 1 or 0?

Yes. They can also be other values in the original dataset, but they will be converted to 0 and 1 regardless once transformed by the StandardDataset. From the source:

privileged_classes (list(list or function)): Each element is a list of values which are considered privileged or a boolean function which return True if privileged for the corresponding column in protected_attribute_names. All others are unprivileged. Values are mapped to 1 (privileged) and 0 (unprivileged) if they are not already numerical.

You can also check out the example here to see that in action by altering the gender attribute (e.g., replacing 0 and 1 with 'female' and 'male' right before passing to StandardDataset).

This seems unnecessary if the privileged class is already coded as 1 and unprivileged as 0. But if this is not the case, setting such a requirement would mean a user needs to manipulate the original dataset, which is not desirable.

Reveille
  • 4,359
  • 3
  • 23
  • 46