0

I have a quite a big dataset with the following columns : Timestamp, Avg_Spend. The data is in milliseconds but not at regular interval e.g. 1 hour may 1000 obs, sometimes 2000 and so on.

I have to calculate rolling standard deviation over certain period such as say 5 hour. I tried using runSD(caTools), which works very well on observation but am at loss as to how to get it to work for rolling time periods which may have varying observation. I can write custom loops, but those take long time. The solution has to be somehow vectorized.

Any suggestions?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
MaxiMus
  • 21
  • 1
  • 2
  • How should the window move? From observation to observation? Or on fixed intervals? E.g. first windows 01:00 to 06:00, second from 01:05 to 06:05. – Thierry Oct 12 '15 at 09:36
  • The window will move from observation to observation – MaxiMus Oct 12 '15 at 09:41
  • Have a look at this other [SO question](http://stackoverflow.com/questions/12021171/rolling-computations-in-xts-by-month). – Paul Hiemstra Oct 12 '15 at 10:35
  • Paul, thanks for the link. In the other question they can use the all daily data from end point. It is a bit different and perhaps easier than my situation as I need to get that point which represents my moving window in minutes. I finally managed to solve it, it takes a bit of time (15 minutes for 10 million rows) but gets the work done. I just wrote a custom loop where for each row, I get the first point in window. Once I get the points, I use vector sd(abc[i:j]) to calculate sd. This seems faster than using formula for each row. – MaxiMus Oct 12 '15 at 16:02

0 Answers0