2

I'm just trying to understand grouper a little more. I know calling the frequency makes some functions void. But I was wondering if there was a work around to count the last segment when using grouper. For example, I want the max count to include the 15min segment between 13:30:00 and 13:45:00 for the df below.

df = pd.DataFrame({
    'Time' : ['1904-01-01 13:00:00','1904-01-01 13:10:00','1904-01-01 13:15:00','1904-01-01 13:25:00','1904-01-01 13:35:00'],                 
    'Number' : [2,2,1,1,1],                      
    })

df['Time'] = pd.to_datetime(df['Time'])

df = df.groupby(pd.Grouper(freq='15T', key='Time', closed = 'left'))['Number'].max().ffill()
df = df.reset_index(level=['Time'])

Out:

                 Time  Number
0 1904-01-01 13:00:00       2
1 1904-01-01 13:15:00       1
2 1904-01-01 13:30:00       1

This can be achieved by using the label = 'right'. But the output is:

                 Time  Number
0 1904-01-01 13:15:00       2
1 1904-01-01 13:30:00       1
2 1904-01-01 13:45:00       1

I'm hoping to return the df:

                 Time  Number
0 1904-01-01 13:00:00       2
1 1904-01-01 13:15:00       1
2 1904-01-01 13:30:00       1
3 1904-01-01 13:45:00       1
jonboy
  • 415
  • 4
  • 14
  • 45

1 Answers1

3

I get different output in pandas 0.24.2:

df = df.groupby(pd.Grouper(freq='15T', key='Time', closed = 'right'))['Number'].max().ffill()
df = df.reset_index(level=['Time'])
print (df)
                 Time  Number
0 1904-01-01 12:45:00       2
1 1904-01-01 13:00:00       2
2 1904-01-01 13:15:00       1
3 1904-01-01 13:30:00       1

Labels is possible shift by label='right', but values are not changed:

df = (df.groupby(pd.Grouper(freq='15T', key='Time', label='right', closed = 'right'))['Number']
       .max().ffill())

#resample alternative
df2 = df.resample('15T', on='Time', label='right', closed='right')['Number'].max().ffill()
df2 = df.reset_index(level=['Time'])
print (df2)
                 Time  Number
0 1904-01-01 13:00:00       2
1 1904-01-01 13:15:00       2
2 1904-01-01 13:30:00       1
3 1904-01-01 13:45:00       1

More information are found in how to understand closed and label arguments in pandas resample method?, because same logic is used for Grouper and most parameters are same.

Possible solution for your problem is add new row with shifted max datetime by 15 Min:

df1 = pd.DataFrame({'Time': df['Time'].max() + pd.Timedelta('15T'),
                    'Number':df['Number'].iat[-1]}, index=[0])
print (df1)
                 Time  Number
0 1904-01-01 13:50:00       1

df = df.append(df1).groupby(pd.Grouper(freq='15T', key='Time'))['Number'].max()
df = df.reset_index(level=['Time'])

print (df)
                 Time  Number
0 1904-01-01 13:00:00       2
1 1904-01-01 13:15:00       1
2 1904-01-01 13:30:00       1
3 1904-01-01 13:45:00       1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 2
    Thanks @jezrael. I did consider adding a new row at the end of the df. I just didn't know if there was a function I was missing that could perform the same task. – jonboy Aug 03 '19 at 07:48