merging aggregate data in R (again)

Question

Following up my earlier question with the same title, I have a long term sub-hourly data, and I want to aggregate the data in various ways. I want to have the aggregate based on the hour of the day, but also on the combinations of aggregation, for example, day-type-hourly (i.e. sunday 1am, sunday 2am, etc). Another example would be: weekend-or-weekday-hourly.

The example below shows two kinds of aggregation that I do. I have managed that far. So I ended up with two zoo objects. What I want to do next is to merge the aggregation into the original data, so that I can compare the error of aggregation. This is where I am stuck at the moment.

Note that I do not use the solution in the previous question because I want the flexibility of aggregation.

Here is the snippet that shows what I tried so far. Any help would be greatly appreciated.

library(zoo)
Lines <- "Index,light.kw
2013-06-14 13:00:00,3.436
2013-06-14 13:15:00,3.327
2013-06-14 13:30:00,3.319
2013-06-14 13:45:00,3.308
2013-06-14 14:00:00,3.458
2013-06-14 14:15:00,3.452
2013-06-14 14:30:00,3.445
2013-06-14 14:45:00,3.469
2013-06-14 15:00:00,3.468
2013-06-14 15:15:00,3.427
2013-06-14 15:30:00,3.168
2013-06-14 15:45:00,2.383
2013-06-15 13:00:00,0.555
2013-06-15 13:15:00,0.555
2013-06-15 13:30:00,0.555
2013-06-15 13:45:00,0.555
2013-06-15 14:00:00,0.555
2013-06-15 14:15:00,0.555
2013-06-15 14:30:00,0.555
2013-06-15 14:45:00,0.719
2013-06-15 15:00:00,0.976
2013-06-15 15:15:00,0.981
2013-06-15 15:30:00,1.116
2013-06-15 15:45:00,0.59"
con <- textConnection(Lines)
z <- read.zoo(con, header=TRUE, sep=",",
     format="%Y-%m-%d %H:%M:%S", FUN=as.POSIXct)
close(con)

index.hourly = format(index(z), "%H")
z.hourly = aggregate(z, index.hourly, mean)
z.hourly
merge(z,z.hourly)

index.dayhour = format(index(z), "%w %H")
z.dayhour = aggregate(z, index.dayhour, mean)
z.dayhour
merge(z,z.dayhour)

How would you control the error of aggregation, by merging with the original data? — agstudy, Jul 25 '13 at 02:03
You would probably want code like this to work but I suspect it might need an intermediate column to be constructed: `merge(z,z.hourly, by.x=format(index(z),"%H") )` ; `merge(z,z.dayhour, by.x=format(index(z), "%w %H") )` — IRTFM, Jul 25 '13 at 02:08
@agstudy Not to "control the error", I just want to have a side-by-side comparison between the original data and the aggregate, so that I could use the aggregate as a predictor of the original data. — ery, Jul 25 '13 at 16:28
@DWin Thanks, I developed a solution based on this suggestion. — ery, Jul 25 '13 at 16:28

score 0 · Answer 1 · edited May 23 '17 at 12:29

Based on DWin's suggestion above, here is a solution that I found. Note that merging with intermediate column as suggested by DWin does not work in zoo, so this solution involves converting the zoo object back to a data frame and do the merging as data frames. Here it is:

library(zoo)
Lines <- "Index,light.kw
2013-06-14 13:00:00,3.436
2013-06-14 13:15:00,3.327
2013-06-14 13:30:00,3.319
2013-06-14 13:45:00,3.308
2013-06-14 14:00:00,3.458
2013-06-14 14:15:00,3.452
2013-06-14 14:30:00,3.445
2013-06-14 14:45:00,3.469
2013-06-14 15:00:00,3.468
2013-06-14 15:15:00,3.427
2013-06-14 15:30:00,3.168
2013-06-14 15:45:00,2.383
2013-06-15 13:00:00,0.555
2013-06-15 13:15:00,0.555
2013-06-15 13:30:00,0.555
2013-06-15 13:45:00,0.555
2013-06-15 14:00:00,0.555
2013-06-15 14:15:00,0.555
2013-06-15 14:30:00,0.555
2013-06-15 14:45:00,0.719
2013-06-15 15:00:00,0.976
2013-06-15 15:15:00,0.981
2013-06-15 15:30:00,1.116
2013-06-15 15:45:00,0.59"
con <- textConnection(Lines)
z <- read.zoo(con, header=TRUE, sep=",",
     format="%Y-%m-%d %H:%M:%S", FUN=as.POSIXct)
close(con)

# make the index for aggregation
index.hourly <- format(index(z), "%H")
# make the aggregate
z.hourly = aggregate(z, index.hourly, mean, na.rm=T)

# make a data frame from the original zoo,
# but the data frame must include the index.hourly
# so that later we can merge the data frame based
# on this index.
# First, make a zoo object of the index and then
# merge this with the original zoo.
z.index.hourly = zoo(index.hourly,index(z))
z.with.index = merge(z,z.index.hourly)
# make a dataframe of the last zoo
df1 = as.data.frame(z.with.index)
# add the index of the df1 (which is the timestamp) as a column
# as we will need the timestamp to rebuild the zoo object.
df1$Index = row.names(df1)

# make a dataframe of the aggregate zoo
df2 = as.data.frame(z.hourly)
df2$Index = row.names(df2)

# merge the two data frame
df3 = merge(df1,df2,by.x="z.index.hourly",by.y="Index",all.x=T)
df3 = df3[order(df3$Index),]
summary(df3)

# make a zoo object containing the original data and the aggregate
z.merged.agg = zoo(df3[,c(2,4)],as.POSIXct(df3$Index, tz="GMT"))
z.merged.agg

merging aggregate data in R (again)

1 Answers1