-1

I have a data frame and it looks like this:

submition_id       time_answered
           2 2017-11-06 14:19:30
           2 2017-11-06 14:22:45
           2 2017-11-06 14:24:55
           2 2017-11-06 14:34:10
           4 2017-11-06 14:31:23
           4 2017-11-06 14:33:21
           5 2017-11-06 14:54:48
           5 2017-11-06 14:59:38 
           5 2017-11-06 15:05:39
         ...                 ...

So basically my submition_id column is a factor column and time_answered column is POSIXct column. I want to find the time difference for each factor which is basically max(x) - min(x). How can I do this effective with inbuilt functions? Such that I get

submition_id       difference
           2            14.67
           4             1.96
           5             9.77
         ...              ... 

I would like it in minutes as well.

Sotos
  • 51,121
  • 6
  • 32
  • 66
arezaie
  • 309
  • 2
  • 13
  • 4
    You can use `aggregate` with `difftime` to find the difference between highest and lowest time grouped by 'submittion_id' Or `library(dplyr); df1 %>% group_by(submition_id) %>% summarise(difference = difftime(min(time_answered), max(time_answered), unit = 'min'))` – akrun Nov 17 '17 at 09:53
  • 1
    Please use `dput()` to show your data! – jogo Nov 17 '17 at 09:54
  • My data won't fit in here, I wish I could – arezaie Nov 17 '17 at 10:00
  • @arezaie you can just save the first 10 rows of your data as another dataframe and post that one. No need for the whole dataset :-) – AntoniosK Nov 17 '17 at 10:05
  • 1
    structure(list(submition_id = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("2", "4", "5"), class = "factor"), time_answered = structure(c(1509974370, 1509974372, 1509974378, 1509974381, 1509975107, 1509975110, 1509975112, 1509975217, 1509975217, 1509975221, 1509975226, 1509975229, class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("submition_id", "time_answered" ), class = "data.frame", row.names = c(NA, -12L)) – arezaie Nov 17 '17 at 10:11
  • Well @akrun code worked almost. I had to switch to order of min and max since it would give me negative values otherwise – arezaie Nov 17 '17 at 10:14
  • Yeah, it should be that way. I didn't test the code – akrun Nov 17 '17 at 10:16
  • Why the downvote? Was a perfectly fine question – arezaie Nov 17 '17 at 10:29
  • 1
    Probably because no research and no attempt to solve the problem was made. See the [ask] page. – robinCTS Nov 17 '17 at 10:36

3 Answers3

3

I believe there must be similar questions on SO.

However, for the sake of completeness, here is a data.table solution (including data):

library(data.table)
setDT(DT)[, .(diff = difftime(max(time_answered), min(time_answered), unit = 'mins')), 
          by = submition_id]
   submition_id           diff
1:            2 14.666667 mins
2:            4  1.966667 mins
3:            5 10.850000 mins

Data

DT <- readr::read_table("submition_id       time_answered
           2 2017-11-06 14:19:30
           2 2017-11-06 14:22:45
           2 2017-11-06 14:24:55
           2 2017-11-06 14:34:10
           4 2017-11-06 14:31:23
           4 2017-11-06 14:33:21
           5 2017-11-06 14:54:48
           5 2017-11-06 14:59:38 
           5 2017-11-06 15:05:39")
Community
  • 1
  • 1
Uwe
  • 41,420
  • 11
  • 90
  • 134
2
require(dplyr)
DF <- DF%>%group_by(submition_id)%>%summarise(difference = difftime(min(time_answered), max(time_answered), units = "min"))
AntonCH
  • 272
  • 2
  • 3
  • 13
0

The lubridate package is very useful, interval fromlubridate will create a time span between specific times, you can then then calculate the duration of this time span, using group_by from dplyr and arrange ... can you supply some data example / dput as others have suggested to work with an extract of your data

sorearm
  • 409
  • 2
  • 10
  • Se if this works structure(list(submition_id = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("2", "4", "5"), class = "factor"), time_answered = structure(c(1509974370, 1509974372, 1509974378, 1509974381, 1509975107, 1509975110, 1509975112, 1509975217, 1509975217, 1509975221, 1509975226, 1509975229, class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("submition_id", "time_answered" ), class = "data.frame", row.names = c(NA, -12L)) – arezaie Nov 17 '17 at 10:09