1

I need help with pivoting my df to get the start and end day.

Id  Day   Value
111 6     a
111 5     a
111 4     a 
111 2     a
111 1     a
222 3     a
222 2     a
222 1     a
333 1     a

The desired result would be:

Id  StartDay  EndDay
111 4         6
111 1         2       (since 111 skips day 3)
222 1         3
333 1         1

Thanks a bunch!

TylerNG
  • 919
  • 2
  • 9
  • 23

1 Answers1

1

So, my first thought was just :

df.groupby('Id').Day.agg(['min','max'])

But then I noticed your stipulation "(since 111 skips day 3)", which means we have to make an identifier which tells us if the current row is in the same 'block' as the previous (same Id, contiguous Day). So, we sort:

df.sort_values(['Id','Day'], inplace=True)

Then define the block:

df['block'] = ((df.Day!=(df.shift(1).Day+1).fillna(0).astype(int))).astype(int).cumsum()

(adapted from top answer to this question: Finding consecutive segments in a pandas data frame)

then group by Id and block:

df.groupby(['Id','block']).Day.agg(['min','max'])

Giving:

Id  block   min max
111 1       1   2
111 2       4   6
222 3       1   3
333 4       1   1
greg_data
  • 2,247
  • 13
  • 20