I am looking to calculate differences in time for different groups based on beginning work times and end work times. How can I tell R to calculate difftime between two rows based on their labels couched in a group? Below is a sample data set:
library(data.table)
latemail <- function(N, st="2012/01/01", et="2012/02/01") {
st <- as.POSIXct(as.Date(st))
et <- as.POSIXct(as.Date(et))
dt <- as.numeric(difftime(et,st,unit="sec"))
ev <- sort(runif(N, 0, dt))
rt <- st + ev
}
#create our data frame
set.seed(42)
dt = latemail(20)
work = setDT(as.data.frame(dt))
work[,worker:= stringi::stri_rand_strings(2, 5)]
work[,dt:= as.POSIXct(as.character(work$dt), tz = "GMT")]
work[,status:=NA]
#order
setorder(work, worker, dt)
#add work times
work$status[1] = "start"
work$status[5] = "end"
work$status[6] = "start"
work$status[10] = "end"
work$status[11] = "start"
work$status[15] = "end"
work$status[16] = "start"
work$status[20] = "end"
table looks like this now:
dt worker status
1: 2012-01-04 23:11:31 VOuRp start
2: 2012-01-09 15:53:16 VOuRp NA
3: 2012-01-15 02:56:45 VOuRp NA
4: 2012-01-16 21:12:26 VOuRp NA
5: 2012-01-20 16:27:31 VOuRp end
6: 2012-01-22 15:34:05 VOuRp start
7: 2012-01-23 15:01:18 VOuRp NA
8: 2012-01-29 03:36:56 VOuRp NA
9: 2012-01-29 20:11:02 VOuRp NA
10: 2012-01-31 02:48:01 VOuRp end
11: 2012-01-04 10:24:38 u8zw5 start
12: 2012-01-08 17:02:20 u8zw5 NA
13: 2012-01-14 23:33:35 u8zw5 NA
14: 2012-01-15 12:23:52 u8zw5 NA
15: 2012-01-18 03:53:15 u8zw5 end
16: 2012-01-21 03:48:08 u8zw5 start
17: 2012-01-23 02:01:10 u8zw5 NA
18: 2012-01-26 12:51:10 u8zw5 NA
19: 2012-01-29 18:23:46 u8zw5 NA
20: 2012-01-29 22:22:14 u8zw5 end
Answer I'm looking for:
ultimately I would like to get the bottom values (labeled worker 1 and worker 2 just because wasn't sure how to do the parallel of set.seed()
for stringi). The following code gives me the first row for worker 1, but I'd like each shift for each worker:
difftime(as.POSIXct("2012-01-20 16:27:31"), as.POSIXct("2012-01-04 23:11:31"), units = "hours")
Work time time difference in hours
worker 1 377.2667 hours
worker 2 . . . .
In this example I have an even set of values between workers, but assuming I have variable rows between different workers what would that look like? I'm assuming some sort of difftime formula? I would perfer a data table solution as I am working with large data.