1

Suppose I have a zoo object (or it could be a data.frame) that has an index on "time of day" and has some value (see sample data below):

                    val
...
2006-08-01 12:00    23
2006-08-01 12:01    24
2006-08-01 12:02    25
2006-08-01 12:03    26
2006-08-01 12:04    27
2006-08-01 12:05    28
2006-08-01 12:06    29
...
2006-08-02 12:00    123
2006-08-02 12:01    124
2006-08-02 12:02    125
2006-08-02 12:03    126
2006-08-02 12:04    127
...

I would like to call a custom function (call it custom.func(vals)) from 12:01 - 12:03 (i.e. something similar to zoo::rollapply) every time that interval occurs so in this example, daily. How would I do that?


NOTES (for robustness, it would also be great to take into account the following edge cases but not necessary):

  1. Don't assume that I have values for 12:01 - 12:03 every day
  2. Don't assume that the entire range 12:01 - 12:03 is present every day. Some days I might only have 12:01 and 12:02 but might be missing 12:03
  3. What if I wanted my custom.func(vals) to be called on day boundaries like using val from 23:58 - 00:12?
Denis
  • 11,796
  • 16
  • 88
  • 150

2 Answers2

1

I recommend runner package which allows to compute any rolling function on irregular time series. Function runner is equivalent of rollApply with distinction that it can depend on dates. runner allows to apply any R function on window length defined by k with date idx (or any integer). Example below calculates regression on 5-minutes (5*60 sec) window span. Algorithm don't care if there will be day-change, just compute 5-minutes each time (for example 23:56-00:01).

Create data:

set.seed(1)
x <- cumsum(rnorm(1000))
y <- 3 * x + rnorm(1000)
time <- as.POSIXct(cumsum(sample(60:120, 1000, replace = TRUE)), 
                   origin = Sys.Date()) # unequaly spaced time series
data <- data.frame(time, y, x)

Custom function to be called on sliding windows:

library(runner)

running_regression <- function(idx) {
  predict(lm(y ~ x, data = data))[max(idx)]
}

data$pred <- runner(seq_along(x), 
                    k = 60 * 5,
                    idx = time,
                    f = running_regression)



Once we have created dataset with rolling 5-minute prediction, then we can filter only particular windows - here, only 1-st minute of the hour. It means that we always keep {hh}:56 - {hh+1}:01


library(dplyr)
library(lubridate)
filtered <-
  data %>% 
  filter(minute(time) == 1)


plot(data$time, data$y, type = "l", col = "red")
points(filtered$time, filtered$pred, col = "blue")

enter image description here

There are some other examples in vignette how to do this with runner

GoGonzo
  • 2,637
  • 1
  • 18
  • 25
1

Suppose our input is the POSIXct zoo object z given in the Note at the end.

Create a character vector times which has one element per element of z and is in the form HH:MM. Then create a logical ok which indicates which times are between the indicated boundary values. z[ok] is then z reduced to those values. Finally for each day apply sum (can use some other function if desired) using aggregate.zoo :

times <- format(time(z), "%H:%M")
ok <- times >= "12:01" & times <= "12:03"
aggregate(z[ok], as.Date, sum)
## 2006-08-01 2006-08-02 
##         75        375 

times straddle midnight

The version is for the case where the times straddle midnight. Note that the order of values sent to the function is not the original order but if the function is symmetric that does not matter.

times <- format(time(z), "%H:%M")
ok <- times >= "23:58" | times <= "00:12"
aggregate(z[ok], (as.Date(format(time(z))) + (times >= "23:58"))[ok], sum)
## 2006-08-02 
##         41 

Variation

The prior code chunk works if the function is symmetric in the components of its argument (which is the case for many functions such as mean and sum) but if the function were not symmetric we would need a slightly different approach. We define to.sec which translates an HH:MM string to numeric seconds and subtract to.sec("23:58") from each POSIXct datetime. Then the components of z to keep are those whose transformed times converted to HH:MM character strings that are less than "00:14".

to.sec <- function(x) with(read.table(text = x, sep = ":"), 3600 * V1 + 60 * V2)
times <- format(time(z) - to.sec("23:58"), "%H:%M")
ok <- times <= "00:14"
aggregate(z[ok], as.Date(time(z)[ok] - to.sec("23:58")), sum)
## 2006-08-01 
##         41 

Note

Lines <- "datetime val
2006-08-01T12:00    23
2006-08-01T12:01    24
2006-08-01T12:02    25
2006-08-01T12:03    26
2006-08-01T12:04    27
2006-08-01T12:05    28
2006-08-01T12:06    29
2006-08-01T23:58    20
2006-08-02T00:01    21
2006-08-02T12:00    123
2006-08-02T12:01    124
2006-08-02T12:02    125
2006-08-02T12:03    126
2006-08-02T12:04    127"

library(zoo)
z <- read.zoo(text = Lines, tz = "", header = TRUE, format = "%Y-%m-%dT%H:%M")

EDIT

Have revised the non-symmetric code and simplified all code chunks.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341