This question is similar to a question already posted few days ago, Collapse rows from 0 to 0
The new twist here which is different from the previous question is this, how do we collapse rows by Id for only those rows where the time different is less than or equal to 60.
For example, using the same dataset
Incident.ID.. date product
INCFI0000029582 2014-09-25 08:39:45 foo
INCFI0000029582 2014-09-25 08:39:45 foo
INCFI0000029582 2014-09-25 08:39:48 bar
INCFI0000029582 2014-09-25 08:40:44 foo
INCFI0000029582 2014-10-10 23:04:00 foo
INCFI0000029587 2014-09-25 08:33:32 bar
INCFI0000029587 2014-09-25 08:34:41 bar
INCFI0000029587 2014-09-25 08:35:24 bar
INCFI0000029587 2014-10-10 23:04:00 foo
df <- structure(list(Incident.ID.. = c("INCFI0000029582", "INCFI0000029582","INCFI0000029582",
"INCFI0000029582", "INCFI0000029582", "INCFI0000029587", "INCFI0000029587",
"INCFI0000029587", "INCFI0000029587"), date = c("2014-09-25 08:39:45","2014-09-25 08:39:45",
"2014-09-25 08:39:48", "2014-09-25 08:40:44", "2014-10-10 23:04:00",
"2014-09-25 08:33:32", "2014-09-25 08:34:41", "2014-09-25 08:35:24",
"2014-10-10 23:04:00"), product =
c("foo","foo","bar","foo","foo","bar","bar","bar","foo")),
class = "data.frame", row.names = c(NA,
-L))
This calculates the time difference by ID
library(dplyr)
library(lubridate)
df1 <- df %>%
group_by(Incident.ID..) %>%
arrange(ymd_hms(date)) %>%
mutate(diff = c(0, diff(ymd_hms(date))))
Which results in this new column diff as shown below
Incident.ID.. date product diff
INCFI0000029582 2014-09-25 08:39:45 foo 0
INCFI0000029582 2014-09-25 08:39:45 foo 0
INCFI0000029582 2014-09-25 08:39:48 bar 3
INCFI0000029582 2014-09-25 08:40:44 foo 56
INCFI0000029582 2014-10-10 23:04:00 foo 1347796
INCFI0000029587 2014-09-25 08:33:32 bar 0
INCFI0000029587 2014-09-25 08:34:41 bar 69
INCFI0000029587 2014-09-25 08:35:24 bar 43
INCFI0000029587 2014-10-10 23:04:00 foo 1348116
Now only collapsing rows by Incident.ID..
where the time difference is less than or equal to 60, i.e diff <= 60
should result in a final dataset like this
Incident.ID.. DateMin DateMax product diff_collapse
INCFI0000029582 2014-09-25 08:39:45 2014-09-25 08:40:44 foo,bar,foo 0,0,3,56
INCFI0000029582 2014-09-25 08:40:44 2014-10-10 23:04:00 foo 1347796
INCFI0000029587 2014-09-25 08:33:32 2014-09-25 08:34:41 bar 0
INCFI0000029587 2014-09-25 08:34:41 2014-09-25 08:35:24 bar,bar 69,43
INCFI0000029587 2014-09-25 08:35:24 2014-10-10 23:04:00 foo 1348116
Looking for some help on how to create such a collapsed dataset. Thanks in advance.