0

I've triangulated information from other SO answers for the below code, but getting stuck with an error message. Searched SO for similar errors and resolutions but haven't been able to figure it out, so help is appreciated.

For every group ("id"), I want to get the difference between the start times for consecutive rows.

Reproducible data:

require(dplyr)
df <-data.frame(id=as.numeric(c("1","1","1","2","2","2")), 
            start= c("1/31/17 10:00","1/31/17 10:02","1/31/17 10:45", 
                             "2/10/17 12:00", "2/10/17 12:20","2/11/17 09:40"))
time <- strptime(df$start, format = "%m/%d/%y %H:%M")
df %>%
group_by(id)%>%
mutate(diff = time - lag(time),
     diff_mins = as.numeric(diff, units = 'mins'))

Gets me error:

Error in mutate_impl(.data, dots) : Column diff must be length 3 (the group size) or one, not 6 In addition: Warning message: In unclass(time1) - unclass(time2) : longer object length is not a multiple of shorter object length

andrew_reece
  • 20,390
  • 3
  • 33
  • 58
Bananas
  • 3
  • 5

2 Answers2

1

Do you mean something like this?

There is no need for lag here, a simple diff on the grouped times is sufficient.

df %>%
    mutate(start = as.POSIXct(start, format = "%m/%d/%y %H:%M")) %>%
    group_by(id) %>%
    mutate(diff = c(0, diff(start)))
## A tibble: 6 x 3
## Groups:   id [2]
#     id start                diff
#  <dbl> <dttm>              <dbl>
#1    1. 2017-01-31 10:00:00    0.
#2    1. 2017-01-31 10:02:00    2.
#3    1. 2017-01-31 10:45:00   43.
#4    2. 2017-02-10 12:00:00    0.
#5    2. 2017-02-10 12:20:00   20.
#6    2. 2017-02-11 09:40:00 1280.
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • That worked beautifully, @Maurits. Thank you so much for your help. – Bananas May 12 '18 at 18:45
  • Only problem is "0" for the first entry of each group is going to cause problems with analysis down the road. Any way to make those NA (I think df[which[df ==0) <-NA will cause problems with differences that are actually 0 as opposed to being the first in the group)? Thank you so much for your help. – Bananas May 12 '18 at 18:56
  • @Bananas I'd still say there is no need for `lag` (and `difftime`); using a simple `diff` should be faster. Just replace `c(0, diff(start))` with `c(NA, diff(start))`. – Maurits Evers May 13 '18 at 02:33
0

You can use lag and difftime (per Hadley):

df %>%
  mutate(time = as.POSIXct(start, format = "%m/%d/%y %H:%M")) %>%
  group_by(id) %>%
  mutate(diff = difftime(time, lag(time)))

# A tibble: 6 x 4
# Groups:   id [2]
     id start         time                diff  
  <dbl> <fct>         <dttm>              <time>
1    1. 1/31/17 10:00 2017-01-31 10:00:00 <NA>  
2    1. 1/31/17 10:02 2017-01-31 10:02:00 2     
3    1. 1/31/17 10:45 2017-01-31 10:45:00 43    
4    2. 2/10/17 12:00 2017-02-10 12:00:00 <NA>  
5    2. 2/10/17 12:20 2017-02-10 12:20:00 20    
6    2. 2/11/17 09:40 2017-02-11 09:40:00 1280  
andrew_reece
  • 20,390
  • 3
  • 33
  • 58
  • thank you so much. This works perfectly for me, as it handles the first of each group as an NA. – Bananas May 12 '18 at 18:47
  • You're welcome! Please mark this answer as accepted if you feel it's the best resolution to your question, by clicking the checkmark to the left of the answer. – andrew_reece May 12 '18 at 23:37