2

I have a dataframe as shown. Using python, I want to get the sum of 'Value' for each 'Id' group upto the first occurrence of 'Stage' 12.

df = pd.DataFrame({'Id':[1,1,1,2,2,2,2],
               'Date': ['2020-04-23', '2020-04-25', '2020-04-28', '2020-04-20', '2020-05-01', '2020-05-05', '2020-05-12'],
               'Stage': [11, 12, 15, 11, 14, 12, 12],
               'Value': [5, 4, 6, 12, 2, 8, 3]})

Id  Date      Stage Value
 1  2020-04-23  11    5
 1  2020-04-25  12    4
 1  2020-04-28  15    6
 2  2020-04-20  11   12
 2  2020-05-01  14    2
 2  2020-08-05  12    8
 2  2020-05-12  12    3

My desired output:

Id  Value
 1  9
 2  22

Would be very thankful if someone could help.

sak
  • 59
  • 4

1 Answers1

2

Let us try use the groupby transform idxmax filter the dataframe , then do another round of groupby

idx = df['Stage'].eq(12).groupby(df['id']).transform('idxmax')
output = df[df.index <= idx].groupby('id')['Value'].sum().reset_index()

Detail

the transform with idxmax will return the first index match with 12 for all the groupby row, then we need to filter the df with index less than that to get the data until the first 12 show up.

BENY
  • 317,841
  • 20
  • 164
  • 234
  • @AkshaySehgal ...why you think this not work .....idxmax return the first 12's index ... – BENY Sep 02 '20 at 02:07