1

I have a DataFrame that looks like this:

id OUTCOME
A    0
A    1
A    0
B    0
B    0
B    0
C    0
C    1
C    1

How can I re-assign the outcome values so that they are equal to the maximum value for each group? In other words, the outcome should look like this:

id OUTCOME
 A    1
 A    1
 A    1
 B    0
 B    0
 B    0
 C    1
 C    1
 C    1

I have tried doing this:

id_tuple = ('A', 'B', 'C')
g = df.groupby('id')
for item in id_tuple:
    new_df = g.get_group(item)
    new_df['OUTCOME'] = new_df['OUTCOME'].max()
    df2 = pd.concat([df2, new_df], axis=0)

This is taking a very long time, so I am looking for a better way. I appreciate your advice!

Andrei A.
  • 13
  • 2

1 Answers1

0

You can first group by the 'id' column, and then perform a .transform(..) on the OUTCOME column:

df['OUTCOME'] = df.groupby('id')['OUTCOME'].transform('max')

We then obtain:

>>> df
  id  OUTCOME
0  A        1
1  A        1
2  A        1
3  B        0
4  B        0
5  B        0
6  C        1
7  C        1
8  C        1
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555