For a text classification experiment, I'm trying to calculate a weighted random baseline for a class distribution. I have three labels. This is some code I found for two labels: 'm' and 'f'.
def wrb(distribution): # weighted random baseline
sum = 0
if isinstance(distribution,float):
elem2 = 1 - distribution
distribution = [distribution,elem2]
for prop in distribution:
sum += prop**2
return sum
distr = labels.count('m')/len(labels)
print('WRB', wrb(distr))
My question is which of my labels do I need to fill in, in place of the 'm' in distr = labels.count('m')/len(labels)
? Is there a rule or do I literally chose 1 of my three labels randomly?