0

I'm tried to use lag on a column of a data frame but when time is involved it just wont work. I've tried shift, lag and tlag.

Example:

y = strptime(sprintf("%s:%s:%s", 4, 20, 10), "%H:%M:%S")
yy = strptime(sprintf("%s:%s:%s", 10, 20, 10), "%H:%M:%S")
lag(c(y,yy))

Error in format.POSIXlt(x, usetz = usetz) : invalid component [[10]] in "POSIXlt" should be 'zone'

tlag(c(y,yy))

Error in n_distinct_multi(list(...), na.rm) : argument "time" is missing, with no default

shift(c(y,yy))
[[1]]
[1] NA 10

[[2]]
[1] NA 20

[[3]]
[1] NA  4

[[4]]
[1] NA  4

[[5]]
[1] NA  6

[[6]]
[1]  NA 117

[[7]]
[1] NA  2

[[8]]
[1]  NA 184

[[9]]
[1] NA  1

[[10]]
[1] NA    "BST"

[[11]]
[1]   NA 3600

I don't want any time differences, I simply want the value from the row above in my data frame, which I thought was what lag did: "Lead and lag are useful for comparing values offset by a constant (e.g. the previous or next value)". The time shouldn't even matter, it should just choose whatever numeric/character/time from the previous position. How do I fix this or is there a different function that does the equivalent of what I'd like - I do not want to involve any loops as speed is important and the data frames are large.

Example from my dataframe:

structure(list(sec = c(52, 53, 54, 55, 56, 57, 58, 59, 0, 1), 
    min = c(50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 51L, 51L), 
    hour = c(11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L
    ), mday = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), mon = c(6L, 
    6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), year = c(117L, 117L, 
    117L, 117L, 117L, 117L, 117L, 117L, 117L, 117L), wday = c(2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), yday = c(184L, 184L, 
    184L, 184L, 184L, 184L, 184L, 184L, 184L, 184L), isdst = c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), zone = c("BST", "BST", 
    "BST", "BST", "BST", "BST", "BST", "BST", "BST", "BST"), 
    gmtoff = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_)), .Names = c("sec", "min", "hour", "mday", "mon", 
"year", "wday", "yday", "isdst", "zone", "gmtoff"), class = c("POSIXlt", 
"POSIXt"))
Olivia
  • 814
  • 1
  • 14
  • 26
  • 1
    It would be better to post a reproducible example containing the data frame you mention in the text. My suspicion is that what you would like to do is quite straight forward with a tidyverse approach using mutate and lag, but it's hard to see at the moment. – JanLauGe Jul 04 '17 at 10:32
  • whats the expected output? – AK88 Jul 04 '17 at 10:32
  • Well the data frame would act the same as the vector above, which shows an error instead of expected 'NA, "2017-07-04 04:20:10 BST"' – Olivia Jul 04 '17 at 10:33
  • 2
    Always convert your time variables to `POSIXct` when working with `data.frame`s, `data.frame` does not handle `POSIXlt` very well, because it's a list inernally. `strptime` does return `POSIXlt`. – snaut Jul 04 '17 at 10:52
  • Thanks, that worked. Why/how does the answer below work without having to do any class conversion? – Olivia Jul 13 '17 at 08:21

1 Answers1

2

For a data.frame like below

  index                time
1     1 2017-07-04 04:20:10
2     2 2017-07-04 10:20:10

you can use dplyr

dplyr::lag(df$time, 1)
[1] NA                         "2017-07-04 04:20:10 CEST"

dplyr::lead(df$time, 1)
[1] "2017-07-04 10:20:10 CEST" NA         

And to add the lead/lag column to your data.frame you can use

dplyr::mutate(df, lead_1 = dplyr::lead(time, 1), lag_1 = dplyr::lag(time, 1))
  index                time              lead_1               lag_1
1     1 2017-07-04 04:20:10 2017-07-04 10:20:10                <NA>
2     2 2017-07-04 10:20:10                <NA> 2017-07-04 04:20:10          
BongoBob
  • 111
  • 2
  • Why does using POSIXlt work inside mutate, but without mutate I have to convert to POSIXct? – Olivia Jul 13 '17 at 08:20