0

I am working to calculate the difference in hours between two timestamps in a dataframe. One timestamp is in the previous row i.e. lasttime and the other is in the leading row i.e. origtime. Here is a small example:-

library(dplyr)
df<-structure(list(DateTime_Start = structure(c(1612284131.736, 1612296614.851
), tzone = "UTC", class = c("POSIXct", "POSIXt")), DateTime_End = structure(c(1612284195.736, 
                                                                              1612296715.851), tzone = "UTC", class = c("POSIXct", "POSIXt"
                                                                              ))), row.names = c(NA, -2L), class = "data.frame")

df

#       DateTime_Start        DateTime_End
#1 2021-02-02 16:42:11 2021-02-02 16:43:15
#2 2021-02-02 20:10:14 2021-02-02 20:11:55

lasttime<-lag(df$DateTime_End)
lasttime
#[1] NA                        "2021-02-02 16:43:15 UTC"

origtime<-lead(df$DateTime_Start)
origtime
#[1] "2021-02-02 20:10:14 UTC" NA

You can notice that it returns an NA next to each run of lasttime and origtime, seems to be a place holder for the other timestamp on the row within df. Then when I try to calculate difftime, I get this:-

difftime(origtime,lastime)
#Time differences in secs
#[1] NA NA

I think this is caused by the NA in origtime and lasttime. Can anyone point out a solution to get the difference between these two datetime objects?

Many thanks!

metaltoaster
  • 380
  • 2
  • 15

2 Answers2

1

The 1st value for lag and last value for lead is always NA by default. Since there is no previous value of 1st row in lag and no next value of last row in lead.

You need to change the default value to get value which is not NA. For example, to get 0 as first value you can do -

library(dplyr)

df %>%
  mutate(difference = difftime(DateTime_Start, lag(DateTime_End, default = first(DateTime_Start)), units = 'hours'))

#       DateTime_Start        DateTime_End     difference
#1 2021-02-02 16:42:11 2021-02-02 16:43:15 0.000000 hours
#2 2021-02-02 20:10:14 2021-02-02 20:11:55 3.449754 hours

In the second row you are calculating DateTime_Start[2] - DateTime_End[1]. If you have more than 2 rows in the 3rd row you'll get output for DateTime_Start[3] - DateTime_End[2] and so on.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

Lag and Lead are looking for the previous and future time stamps. If there isn't one, then they return NA. If that is not the behaviour you want, what behaviour do you want?

You have 4 timestamps. I have absolutely no idea what you are trying to calculate the difference with!

Are you trying to take the difference from the very first start time and the very last endtime?

difftime(min(df$DateTime_Start), max(df$DateTime_End))

If you add a third row to the data, you'd get a difftime for the middle row. Add 200 rows and you will get 200 results and two NA's.

CALUM Polwart
  • 497
  • 3
  • 5
  • I am trying to find the difference between ```DateTime_End``` in the first row and the ```DateTime_Start``` in the second row. As you are inferring, ```lag()``` and ```lead()``` are not appropriate, so what could I use as an alternative? – metaltoaster Aug 30 '21 at 10:25