0

Lets say I have the following dataframe

letter     label
a            3         
b            2
c            3
a            1
a            1
c            3
b            2
a            5

and it outputs this for the max of each count in a new column called label_pred. It counts the number of labels and whatever is the highest, it assigns to that letter.

letter     label          label_pred
a            3                  c        
b            2                  b
c            3                  c
a            1                  a
a            1                  a
c            3                  c
b            2                  b
a            5                  a

because that is the highest count associated with that particular letter. For example, in the first row, it predicts C because C has the most 3s. How would I do this?

I used a pandas groupby df['label_pred']= df.groupby(["letter", "label"]).count().max()

but it returns an empty series

Andrew
  • 3
  • 2
  • The logic is unclear but it looks like `df.groupby('label')['letter'].transform('last')` (or `'max'`?) – mozway Apr 06 '23 at 05:37
  • is it transforming by the max count? Meaning whatever has the most of that particular label (of the letters) gets labeled that letter – Andrew Apr 06 '23 at 05:48
  • No you need the mode, I edited . Also see `df.groupby('label')['letter'].transform(lambda x: x.mode()[0])` – mozway Apr 06 '23 at 05:53

0 Answers0