-1

I have a data frame like below

    ID   sales
0   c1   100.0
1   c1    25.0
2   c1    60.0
3   c1    inf
4   c2    40.0
5   c2    inf
6   c3    50.0
7   c3    inf
8   c3    80.0

I want to replace the 'inf' in sales column to the max value of the group by ID column

So the output should look like something below

  ID    sales
0   c1  100.0
1   c1   25.0
2   c1   60.0
3   c1  100.0
4   c2   40.0
5   c2   40.0
6   c3   50.0
7   c3   80.0
8   c3   80.0

what the best way to do it?

Thank you

yatu
  • 86,083
  • 12
  • 84
  • 139
suma
  • 49
  • 2

1 Answers1

0
import numpy as np
# skip inf records
max_df = df[df['sales'] != np.inf]
# group by ID without inf
for sales_id, id_df in max_df.groupby('ID'):
    # search in original df by ID + inf and set sales to max value of subgroup
    df.loc[(df['sales'] == np.inf) & (df['ID'] == sales_id), 'sales'] = id_df['sales'].max()

print(df)
#    ID  sales
# 0  c1  100.0
# 1  c1   25.0
# 2  c1   60.0
# 3  c1  100.0
# 4  c2   40.0
# 5  c2   40.0
# 6  c3   50.0
# 7  c3   80.0
# 8  c3   80.0
Danila Ganchar
  • 10,266
  • 13
  • 49
  • 75