I just can't get this trivial behaviour to work. I've included a lot of examples to show the problem.
I have minute bar data and I'd like to group 15 bars and execute a function on them which will then result in 15-min bar data. The timestamps that I get are the beginning of the period rather than the end. I tried the right
and include.lowest
arguments in cut()
but this doesn't seem to work. I also tried a few dplyr
permutations and couldn't get it to work.
set.seed(4984)
dat = data.frame(time=seq(as.POSIXct("2016-05-01"), as.POSIXct("2016-05-01") + 60*99, by=60),
ret=rnorm(100))
dat$by15 = cut(dat$time, breaks="15 min")
dat.summary = aggregate(ret ~ by15, FUN=sum, data=dat)
> head(dat)
time ret
1 2016-05-01 00:00:00 -0.1739740
2 2016-05-01 00:01:00 0.2906288
3 2016-05-01 00:02:00 -1.0067554
4 2016-05-01 00:03:00 0.3887459
5 2016-05-01 00:04:00 0.2865937
6 2016-05-01 00:05:00 -0.4570531
And aggregated:
> head(dat.summary)
by15 ret
1 2016-05-01 00:00:00 0.6711667
2 2016-05-01 00:15:00 -1.4344507
3 2016-05-01 00:30:00 3.0797471
4 2016-05-01 00:45:00 3.7564378
5 2016-05-01 01:00:00 -2.1308232
6 2016-05-01 01:15:00 -3.7179450
The problem is that the timestamp is taken as the beginning of the period. In the sample above, dat.summary
should have looked like:
> head(dat.summary)
by15 ret
1 2016-05-01 00:14:00 0.6711667
2 2016-05-01 00:29:00 -1.4344507
3 2016-05-01 00:44:00 3.0797471
4 2016-05-01 00:59:00 3.7564378
5 2016-05-01 01:14:00 -2.1308232
6 2016-05-01 01:29:00 -3.7179450
The longer story is the following. I'd like to calculate the realized variance. there is a function for this in R: rRealizedVariance
from the package realized and rRVar
from highfrequency. The problem is that they calculate the *daily( realized variance. I'd like to do this for a different time period (e.g. hourly realized variance). The realized variance is just the sum of squared returns in a period (e.g. 60 minutes). In python, daily realized variance is calculated as:
return returns.resample('D').agg(lambda x: x.pow(2).sum())
and I can calculate it on say 60 minute basis as:
return returns.resample('60min').agg(lambda x: x.pow(2).sum())
I'm trying to get this working in R.
EDIT: I'd like to align the time on 15-minute bounderies, so for example, if dat
did not start on a clean 15-min boundary point:
dat = data.frame(time=seq(as.POSIXct("2016-05-01 00:03:00"), as.POSIXct("2016-05-01 00:03:00") + 60*99, by=60),
ret=rnorm(100))
the simple methods of adding 14 minutes will cause the whole time component to be miss-aligned.