0

I just can't get this trivial behaviour to work. I've included a lot of examples to show the problem.

I have minute bar data and I'd like to group 15 bars and execute a function on them which will then result in 15-min bar data. The timestamps that I get are the beginning of the period rather than the end. I tried the right and include.lowest arguments in cut() but this doesn't seem to work. I also tried a few dplyr permutations and couldn't get it to work.

set.seed(4984)
dat = data.frame(time=seq(as.POSIXct("2016-05-01"), as.POSIXct("2016-05-01") + 60*99, by=60),
                 ret=rnorm(100))
dat$by15 = cut(dat$time, breaks="15 min")
dat.summary = aggregate(ret ~ by15, FUN=sum, data=dat)


> head(dat)
                 time        ret
1 2016-05-01 00:00:00 -0.1739740
2 2016-05-01 00:01:00  0.2906288
3 2016-05-01 00:02:00 -1.0067554
4 2016-05-01 00:03:00  0.3887459
5 2016-05-01 00:04:00  0.2865937
6 2016-05-01 00:05:00 -0.4570531

And aggregated:

> head(dat.summary)
                 by15        ret
1 2016-05-01 00:00:00  0.6711667
2 2016-05-01 00:15:00 -1.4344507
3 2016-05-01 00:30:00  3.0797471
4 2016-05-01 00:45:00  3.7564378
5 2016-05-01 01:00:00 -2.1308232
6 2016-05-01 01:15:00 -3.7179450

The problem is that the timestamp is taken as the beginning of the period. In the sample above, dat.summary should have looked like:

> head(dat.summary)
                 by15        ret
1 2016-05-01 00:14:00  0.6711667
2 2016-05-01 00:29:00 -1.4344507
3 2016-05-01 00:44:00  3.0797471
4 2016-05-01 00:59:00  3.7564378
5 2016-05-01 01:14:00 -2.1308232
6 2016-05-01 01:29:00 -3.7179450

The longer story is the following. I'd like to calculate the realized variance. there is a function for this in R: rRealizedVariance from the package realized and rRVar from highfrequency. The problem is that they calculate the *daily( realized variance. I'd like to do this for a different time period (e.g. hourly realized variance). The realized variance is just the sum of squared returns in a period (e.g. 60 minutes). In python, daily realized variance is calculated as:

    return returns.resample('D').agg(lambda x: x.pow(2).sum())

and I can calculate it on say 60 minute basis as:

    return returns.resample('60min').agg(lambda x: x.pow(2).sum())

I'm trying to get this working in R.

EDIT: I'd like to align the time on 15-minute bounderies, so for example, if dat did not start on a clean 15-min boundary point:

dat = data.frame(time=seq(as.POSIXct("2016-05-01 00:03:00"), as.POSIXct("2016-05-01 00:03:00") + 60*99, by=60),
                 ret=rnorm(100))

the simple methods of adding 14 minutes will cause the whole time component to be miss-aligned.

s5s
  • 11,159
  • 21
  • 74
  • 121

1 Answers1

1

One naïve answer would be:

set.seed(4984)
dat = data.frame(time=seq(as.POSIXct("2016-05-01"), as.POSIXct("2016-05-01") + 60*99, by=60),
                 ret=rnorm(100))
dat$by15 = cut(dat$time, breaks="15 min")

# add 14 minutes
dat$by15 <- as.POSIXct(as.character(dat$by15)) + 60 * 14

dat.summary = aggregate(ret ~ by15, FUN=sum, data=dat)

Edit 1 :

min_date <- "2016-05-01"
max_date <- "2016-06-01"
dat$by15 = cut(dat$time, breaks=seq(as.POSIXct(paste(min_date, "00:00:00")),
                                    as.POSIXct(paste(max_date, "00:15:00")), by = "15 min"))
Rémi Coulaud
  • 1,684
  • 1
  • 8
  • 19
  • If I redefine the following, then summing causes the whole thing to be miss-aligned: dat = data.frame(time=seq(as.POSIXct("2016-05-01 00:03:00"), as.POSIXct("2016-05-01 00:03:00") + 60*99, by=60), ret=rnorm(100)) – s5s Jun 20 '21 at 12:07
  • You don't want just to change label of your `cut` ? – Rémi Coulaud Jun 20 '21 at 12:08
  • I do, but unfortunately, it has to align on N-minute boundaries in this case, 15 min boundaries. It just so happens that the example I gave starts at a boundary so the method of adding 14 minutes will shift each cut to the right place. What if the data does not start at a boundary? If it starts at 3-min past the hour then the cut will not be 0-14, 15-29, 30-44, 45-59 but will be 3-17, 18-32, 33-47, 48-02 – s5s Jun 20 '21 at 12:12
  • Did you try like in the edit 1 to specify the breaks you want. It is often a good solution. – Rémi Coulaud Jun 20 '21 at 12:16
  • Is specifying the breaks solve your probelm ? – Rémi Coulaud Jun 20 '21 at 13:36
  • Yes, I believe so. I'm looking at adding the data to the model and joining it with other data but as far as I can tell from examining it by hand, adding the breaks makes it behave as expected – s5s Jun 20 '21 at 14:11