0

How can I set the 5 minutes window size to re-sample the data through the rolling mean? I tried different ways but I always get error about the DateTimeIndex, even my data has Index as DateTimeIndex..

Any one knows the right way to do it??

Data

                index   Speed   rolling_meanVal
DateTime            
1/1/2011 0:04   2165    0.057579    NaN
1/1/2011 0:07   3438    0.044646    NaN
1/1/2011 0:10   4713    0.043154    NaN
1/1/2011 0:13   6054    0.014403    NaN
1/1/2011 0:16   7385    0.038972    0.039751
1/1/2011 0:19   8734    0.019927    0.036447
1/1/2011 0:21   10045   0.039548    0.03689
1/1/2011 0:24   11374   0.089709    0.043492
1/1/2011 0:27   12661   0.102816    0.050084
1/1/2011 0:30   13960   0.119699    0.057045
1/1/2011 0:33   15261   0.095108    0.060505
1/1/2011 0:36   16579   0.051854    0.059784
1/1/2011 0:40   17848   0.035654    0.057928
1/1/2011 0:43   19163   0.083695    0.059769
1/1/2011 0:46   20458   0.091149    0.061861
1/1/2011 0:49   21784   0.082233    0.063134
1/1/2011 0:52   23105   0.043388    0.061972
1/1/2011 0:55   24415   0.032073    0.060311
1/1/2011 0:58   25689   0.108548    0.06285
1/1/2011 0:59   27117   0.140965    0.066756
1/1/2011 1:02   28492   0.029816    0.065368
1/1/2011 1:05   29861   0.028124    0.064542
1/1/2011 1:09   31195   0.042464    0.064507
1/1/2011 1:12   32471   0.065898    0.067082
1/1/2011 1:15   33793   0.128899    0.071578
1/1/2011 1:18   35094   0.019488    0.071556
1/1/2011 1:21   36407   0.041034    0.071631
1/1/2011 1:24   37728   0.038828    0.069087
1/1/2011 1:27   39053   0.039328    0.065912
1/1/2011 1:30   40340   0.080378    0.063946

here is the sample data, If I want to take the rolling mean of 5 minutes of window size? I tried the code

result_frame['Speed'].rolling(window=20,min_periods=5).mean().rename('rollingmenaVal')

but don't understand how to set the frequency of 5 minutes? any Help

cs95
  • 379,657
  • 97
  • 704
  • 746
id101112
  • 1,012
  • 2
  • 16
  • 28

1 Answers1

4

your window is going to be '5T', for 5 minutes,

df['rollingmeanVal'] = df.rolling('5T').Speed.mean()

                     index     Speed  rollingmeanVal
DateTime                                            
2011-01-01 00:04:00   2165  0.057579        0.057579
2011-01-01 00:07:00   3438  0.044646        0.051112
2011-01-01 00:10:00   4713  0.043154        0.043900
2011-01-01 00:13:00   6054  0.014403        0.028779
2011-01-01 00:16:00   7385  0.038972        0.026687
2011-01-01 00:19:00   8734  0.019927        0.029449
2011-01-01 00:21:00  10045  0.039548        0.029738
2011-01-01 00:24:00  11374  0.089709        0.064629
2011-01-01 00:27:00  12661  0.102816        0.096263
2011-01-01 00:30:00  13960  0.119699        0.111258
2011-01-01 00:33:00  15261  0.095108        0.107404
2011-01-01 00:36:00  16579  0.051854        0.073481
2011-01-01 00:40:00  17848  0.035654        0.043754
2011-01-01 00:43:00  19163  0.083695        0.059675
2011-01-01 00:46:00  20458  0.091149        0.087422
2011-01-01 00:49:00  21784  0.082233        0.086691
2011-01-01 00:52:00  23105  0.043388        0.062811
2011-01-01 00:55:00  24415  0.032073        0.037731
2011-01-01 00:58:00  25689  0.108548        0.070311
2011-01-01 00:59:00  27117  0.140965        0.093862
2011-01-01 01:02:00  28492  0.029816        0.093110
2011-01-01 01:05:00  29861  0.028124        0.028970
2011-01-01 01:09:00  31195  0.042464        0.035294
2011-01-01 01:12:00  32471  0.065898        0.054181
2011-01-01 01:15:00  33793  0.128899        0.097399
2011-01-01 01:18:00  35094  0.019488        0.074194
2011-01-01 01:21:00  36407  0.041034        0.030261
2011-01-01 01:24:00  37728  0.038828        0.039931
2011-01-01 01:27:00  39053  0.039328        0.039078
2011-01-01 01:30:00  40340  0.080378        0.059853
DJK
  • 8,924
  • 4
  • 24
  • 40
  • I tried that but I get the following error message. `ValueError: cannot reindex from a duplicate axis` – id101112 Mar 03 '18 at 23:47
  • @Ravi, do you have repeated datetime values in your index? – DJK Mar 04 '18 at 00:27
  • I used this condition here to get rid of the duplicate values `np.where(df1['CheckID'] != True, df1.rolling('10T').Speed.mean(), np.nan)` but I am getting an error `ValueError: index must be monotonic` what does this mean? any solution? – id101112 Mar 05 '18 at 17:00
  • @Ravi, it would appear that your datetime values are not sorted – DJK Mar 05 '18 at 18:01
  • those are sorted by the id's datetime data, like wise you can see the data, its only for one id, when that id will end the other id will start with the same datetime values, that's why I am using np.where, so it can run till the id finish and it will start again, from where the id is changing, or my condition is wrong, if it's wrong can you correct it? – id101112 Mar 05 '18 at 20:21
  • 1
    If you have some unique id's then you should do a groupby operation as well i.e. `df1.groupby('id').rolling('10T').Speed.mean()`, just make sure you have an asscending sort before you do this i.e. `df1.sort_values(['id',df.index])` – DJK Mar 05 '18 at 20:53
  • 1
    Thank you so much, I tried it but it was other way around which was not working for me, `df1['Speed'].rolling(20, '5min').mean()` I was trying this way, i did this as per documented method, but your method works thanks – id101112 Mar 06 '18 at 14:56
  • Where are you getting `T` is equivalent to minutes? These values are not listed in the pandas docs for [Series.rolling](https://pandas.pydata.org/docs/reference/api/pandas.Series.rolling.html) – Jamie Marshall Nov 29 '21 at 16:11
  • 1
    @JamieMarshall, you can find the list [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases) – DJK Nov 29 '21 at 21:14