7

I want to aggregate data by hourly mean. Daily is very easy:

apply.daily(X2,mean)

Why is there no function for hourly? I tried

hr.means <- aggregate(X2, format(X2["timestamp"],"%Y-%m-%d %H"))

and got always error with trim argument. Is there an easier function similar to apply.daily? What if I want to aggregate the mean of 5 minutes. Data are values per minute:

"timestamp", value 
"2012-04-09 05:03:00",2
"2012-04-09 05:04:00",4
"2012-04-09 05:05:00",5
"2012-04-09 05:06:00",0
"2012-04-09 05:07:00",0
"2012-04-09 05:08:00",3
"2012-04-09 05:09:00",0
"2012-04-09 05:10:00",1

I am using xts and zoo.

Mark Rajcok
  • 362,217
  • 114
  • 495
  • 492
Herr Student
  • 853
  • 14
  • 26

3 Answers3

17

try

period.apply(X2, endpoints(X2, "hours"), mean)

apply.daily is simply a wrapper for the above:

> apply.daily
function (x, FUN, ...)
{
    ep <- endpoints(x, "days")
    period.apply(x, ep, FUN, ...)
}
eddi
  • 49,088
  • 6
  • 104
  • 155
  • Thank you so far, any suggestions for the periodicity of 5 minutes? – Herr Student Apr 15 '13 at 16:09
  • 1
    I get: 05:04:00 - 4; 05:09:00 - 5,... but maybe it is possible to set the first value to 05:00:00 and go on with 05:05:00 might be easier, if I am merging files later to have the same start and timestep. Thanks – Herr Student Apr 15 '13 at 17:14
  • I see. You can probably fiddle around with the output of `endpoints`, or just look at what `endpoints` prints out (it's just a list of indices) and then construct a similar list with whatever points you like instead. – eddi Apr 15 '13 at 18:06
2
hr.means <- aggregate(X2, format(time(X2),"%y-%m-%d %H"), mean) 

This should work fine.

rink.attendant.6
  • 44,500
  • 61
  • 101
  • 156
athlonshi
  • 1,711
  • 1
  • 19
  • 23
1

Answering part 2:

What if I want to aggregate the mean of 5 minutes?

As @eddit already mentioned in a comment above:

df <- read.table(header=TRUE, sep=",", stringsAsFactors=FALSE, text="
timestamp, value 
2012-04-09 05:03:00,2
2012-04-09 05:04:00,4
2012-04-09 05:05:00,5
2012-04-09 05:06:00,0
2012-04-09 05:07:00,0
2012-04-09 05:08:00,3
2012-04-09 05:09:00,0
2012-04-09 05:10:00,1")
X2 <- xts(df$value, as.POSIXct(df$timestamp))

X2.5min <- period.apply(X2, endpoints(X2, "minutes", 5), mean)

I get: 05:04:00 - 4; 05:09:00 - 5,... but maybe it is possible to set the first value to 05:00:00 and go on with 05:05:00 might be easier, if I am merging files later to have the same start and timestep.

Indeed:

> X2.5min
                    [,1]
2012-04-09 05:04:00  3.0
2012-04-09 05:09:00  1.6
2012-04-09 05:10:00  1.0

Darren Cook over at Cross Validated faced the same issue and wrote function align.time.down:

align.time.down=function(x,n){index(x)=index(x)-n;align.time(x,n)}

That can be used to adjust the times down:

X2.5mindown <- align.time.down(X2.5min, 5 * 60)
X2.5mindown
                    [,1]
2012-04-09 05:00:00  3.0
2012-04-09 05:05:00  1.6
2012-04-09 05:10:00  1.0
Community
  • 1
  • 1
Mark Rajcok
  • 362,217
  • 114
  • 495
  • 492
  • As I commented to Darren, it seems like a bad idea to align observations with a time *prior* to their occurrence, especially if you're going to merge the output of the aggregation with another time series. – Joshua Ulrich Aug 21 '15 at 23:39