Removing unwanted Observation, and calculating Time Difference

Question

I have a data frame, when an id is audited it can be either Passed or failed at that particular time. how to calculate, the sum of all the time taken to change status from Failed to Passed state w.r.t id's. If an id has failed, failed, Passed state. I have to add time taken to change status from first failed to passed and second failed to passed state

time <- c("08-10-2018 08:36", "12-10-2018 07:53", "23-10-2018 23:09", "30-10-2018 18:24","07-11-2018 18:13","10-11-2018 05:47","19-11-2018 21:26","26-11-2018 14:04","16-10-2018 03:19","07-11-2018 19:00","09-11-2018 23:25","20-11-2018 19:24", "22-11-2018 01:12","28-11-2018 03:46","04-10-2018 15:05","15-10-2018 15:32","20-10-2018 06:21","26-10-2018 04:51","02-11-2018 00:28","09-11-2018 22:43","15-11-2018 22:39","21-11-2018 04:10","26-11-2018 13:29")
id <-c("A1","A1","A1","A1","A1","A1","A1","A1","A2","A2","A2","A2","A2","A2","A3","A3","A3","A3","A3","A3","A3","A3","A3")
status <- c("FAILED","PASSED","FAILED","PASSED","FAILED","PASSED","PASSED","PASSED","PASSED","FAILED","PASSED","FAILED","PASSED","PASSED","PASSED","FAILED","PASSED","PASSED","PASSED","FAILED","PASSED","PASSED","FAILED")

df <- data.frame(id, time, status)

Required Format:

ids <- c("A1","A2", "A3")
diff_time <- c(13.25, 3.46, 10.61)
df2 <- data.frame(ids,diff_time)

Thanks In Advance

But you have several transitions from Failed -> Passed for each Id. How come you only have 1 value for each? — Sotos, Nov 30 '18 at 09:14
You need to elaborate more. This is not a coding service, you need to help us to help you. Provide more details like how `diff_time` is being calculated — Vivek Kalyanarangan, Nov 30 '18 at 09:15
I have a data frame, when an id is audited it can be either Passed or failed at that particular time. how to calculate, the sum of all the time taken to change status from Failed to Passed state w.r.t id's. If an id has failed, failed, Passed state. I have to add time taken to change status from first failed to passed and second failed to passed state. — Xenus, Nov 30 '18 at 10:05

Uwe · Answer 1 · 2018-12-03T13:11:37.303

If I understand correctly, the OP wants to measure the time difference between each FAILED event to the next subsequent PASSED event (for each id). Finally, the measured differences need to be summed up for each id.

This can be solved by a backward rolling join which is available with the data.table package.

First, we have to separate the FAILED and PASSED events. Then, a right join is used in order to find a subsequent PASSED event for each FAILED event. The two subsets are joined on id and time where the time is rolled backward (NOCB = next observation carried backwards).

library(data.table)
# coerce df to data.table, coerce time to POSIXct
setDT(df)[, time := lubridate::dmy_hm(time)]
# create subset PASSED
dfp <- df[status == "PASSED"][, timep := time]
# create subset FAILED
dff <- df[status == "FAILED"][, timef := time]
# backward rolling join
dfp[dff, on = .(id, time),  roll = -Inf]

   id                time status               timep i.status               timef
1: A1 2018-10-08 08:36:00 PASSED 2018-10-12 07:53:00   FAILED 2018-10-08 08:36:00
2: A1 2018-10-23 23:09:00 PASSED 2018-10-30 18:24:00   FAILED 2018-10-23 23:09:00
3: A1 2018-11-07 18:13:00 PASSED 2018-11-10 05:47:00   FAILED 2018-11-07 18:13:00
4: A2 2018-11-07 19:00:00 PASSED 2018-11-09 23:25:00   FAILED 2018-11-07 19:00:00
5: A2 2018-11-20 19:24:00 PASSED 2018-11-22 01:12:00   FAILED 2018-11-20 19:24:00
6: A3 2018-10-15 15:32:00 PASSED 2018-10-20 06:21:00   FAILED 2018-10-15 15:32:00
7: A3 2018-11-09 22:43:00 PASSED 2018-11-15 22:39:00   FAILED 2018-11-09 22:43:00
8: A3 2018-11-26 13:29:00   <NA>                <NA>   FAILED 2018-11-26 13:29:00

# rolling join and aggregate by id
dfp[dff, on = .(id, time),  roll = -Inf][, .(diff_time = sum(timep - timef, na.rm = TRUE)), by = id]

   id      diff_time
1: A1 13.254167 days
2: A2  3.425694 days
3: A3 10.614583 days

Removing unwanted Observation, and calculating Time Difference

1 Answers1