1

I have a dataframe with two variables, time and dif,

library(lubridate)
a <- data.frame(time=seq(from=as.POSIXct("2019-01-01 01:01:00"),to=as.POSIXct("2019-01-01 01:15:00"),by="min"),dif=make_difftime(mins=c(2,3,5,5,5,2,6,6,6,6,6,6,4,4,4)))

> a
                  time    dif
1  2019-01-01 01:01:00 2 mins
2  2019-01-01 01:02:00 3 mins
3  2019-01-01 01:03:00 5 mins
4  2019-01-01 01:04:00 5 mins
5  2019-01-01 01:05:00 5 mins
6  2019-01-01 01:06:00 2 mins
7  2019-01-01 01:07:00 6 mins
8  2019-01-01 01:08:00 6 mins
9  2019-01-01 01:09:00 6 mins
10 2019-01-01 01:10:00 6 mins
11 2019-01-01 01:11:00 6 mins
12 2019-01-01 01:12:00 6 mins
13 2019-01-01 01:13:00 4 mins
14 2019-01-01 01:14:00 4 mins
15 2019-01-01 01:15:00 4 mins

and I would like to get a sequence that starts at 01:01:00, adds the value of dif and then continues at 01:01:00 + 2 mins = 01:03:00, then adds the value of dif and continues at 01:03:00 + 5 mins = 01:08:00 and so on. The desired output is thus

                  time    dif
1  2019-01-01 01:01:00 2 mins
3  2019-01-01 01:03:00 5 mins
8  2019-01-01 01:08:00 6 mins
14 2019-01-01 01:14:00 4 mins

I have asked a similar question before (iterative cumsum where sum determines the next position to be added) but the non-loop solutions there involve accumulate() and Reduce() which do not seem to function with POSIXct objects. At least they produce the following error binary '+' is not defined for "POSIXt" objects.

Does anyone know how to get this?

bumblebee
  • 1,116
  • 8
  • 20

1 Answers1

0

I agree with digEmAll that a loop is probably going to be a clearer solution than any clever non-loop solution that I can currently think of.

Here's an approach that tries to minimise repeated linear searching or growing data structures by first calculating a map that connects each row to the one that follows by doing a join.

a$row <- 1:nrow(a)
b <- data.frame(time_to = a$time + a$dif)
row_map <- merge(a, b, by.x = "time", by.y = "time_to", all.y = TRUE)$row

a$in_output <- FALSE
current_row <- 1

while(!is.na(current_row)) {
  a[current_row, "in_output"] <- TRUE
  current_row <- row_map[[current_row]]
}

a[a$in_output, c("time", "dif")]

                  time    dif
1  2019-01-01 01:01:00 2 mins
3  2019-01-01 01:03:00 5 mins
8  2019-01-01 01:08:00 6 mins
14 2019-01-01 01:14:00 4 mins

If you have a lot of data, maybe it'd be better to pre-allocate and/or grow a separate vector of row indexes instead of modifying a new column in the original data, but I hope this helps.

Callum Webb
  • 354
  • 2
  • 8