1

I am having some trouble with pandas rolling. Here a simplify version of my dataset:

df2 = pd.DataFrame({  
    'A' : pd.Categorical(["test","train","test","train",'train','hello']), 
    'B' : (pd.Timestamp('2013-01-02 00:00:05'),
                   pd.Timestamp('2013-01-02 00:00:10'),
                   pd.Timestamp('2013-01-02 00:00:09'),
                   pd.Timestamp('2013-01-02 00:01:05'),
                   pd.Timestamp('2013-01-02 00:01:25'),
                   pd.Timestamp('2013-01-02 00:02:05')),
         'C' : 1.}).sort_values('A').reset_index(drop=True)
>>> df2
       A                   B    C
0  hello 2013-01-02 00:02:05  1.0
1   test 2013-01-02 00:00:05  1.0
2   test 2013-01-02 00:00:09  1.0
3  train 2013-01-02 00:00:10  1.0
4  train 2013-01-02 00:01:05  1.0
5  train 2013-01-02 00:01:25  1.0

I would like to have a rolling window of 10s, to get the following output:

       A  count
0  hello   1
1   test   2
3  train   1

I try the groupby and rolling.

df2.groupby('A').rolling('10s', on='B', closed='right').C.sum() 

I get the rolling windows from the past '10s' observation, which is not what i am looking for:

A      B                  
hello  2013-01-02 00:02:05    1.0
test   2013-01-02 00:00:05    1.0
       2013-01-02 00:00:09    2.0
train  2013-01-02 00:00:10    1.0
       2013-01-02 00:01:05    1.0
       2013-01-02 00:01:25    1.0 

I also try resampling, but I am not able to get the result.

grouped = df3.set_index('B').groupby('A').resample('S' )['C'].count()
grouped.reset_index().groupby('A').rolling(window=10,on='B' , min_periods=1).sum() 
youpi
  • 75
  • 1
  • 8
  • In `df2` the last 3 "train" rows have `B` values which are more than 10 seconds apart. So the rolling sum should have 3 rows as well. Can you explain why your desired result only has 1 "train" row? – unutbu Sep 28 '17 at 19:21

2 Answers2

0

I think you have to try this:

df2.groupby('A').rolling('11s', on='B').agg({'C': 'sum'}).groupby('A').max()
Wes
  • 1
  • 1
  • It will give the same output as df2.groupby('A').rolling('10s', on='B', closed='right').C.sum() which is not what I am looking for. – youpi Sep 28 '17 at 16:42
-1

This might do the trick on this example, but i'm not sure it's a good enough solution :

df2.groupby('A').rolling('10s', on='B').agg({'C': 'sum'}).groupby("A").max()
Sami
  • 1