0

I have a dataframe and I want to filter only rows that corresponds to the earliest datetime. The following is my dataframe:

library(lubridate)
df<-data.frame(ID=rep(1:2,5:6),DATETIME_OF_PROCEDURE=rep(c(ymd_hms("2013-03-16 03:00:42"),
                                                           ymd_hms("2013-02-12 08:00:42"),
                                                           ymd_hms("2014-06-19 01:00:42"),
                                                           ymd_hms("2014-01-22 01:00:42"),
                                                           ymd_hms("2014-06-12 02:00:40")),
                                                         c(3,2,2,2,2)))

I would like to get:

#     > df
#    ID DATETIME_OF_PROCEDURE
# 1   1   2013-02-12 16:00:42
# 2   1   2013-02-12 16:00:42
# 3   2   2014-01-22 09:00:42
# 4   2   2014-01-22 09:00:42

I tried the following but it gives an empty dataframe.

df %>% 
  arrange(DATETIME_OF_PROCEDURE) %>% 
  group_by(ID) %>%
  slice(min(DATETIME_OF_PROCEDURE))

I also tried slice(first(DATETIME_OF_PROCEDURE)) but again, it wouldn't work.

Sotos
  • 51,121
  • 6
  • 32
  • 66
HNSKD
  • 1,614
  • 2
  • 14
  • 25
  • The correct syntax would be `df %>% arrange(DATETIME_OF_PROCEDURE) %>% group_by(ID) %>% slice(1L)` but `slice` will not capture ties. – Sotos May 08 '17 at 09:29

2 Answers2

2
df %>% 
    group_by(ID) %>% 
    filter(DATETIME_OF_PROCEDURE == min(DATETIME_OF_PROCEDURE))
Iaroslav Domin
  • 2,698
  • 10
  • 19
2

Since you already arrange, slice(1) works fine:

df %>% 
  arrange(DATETIME_OF_PROCEDURE) %>% 
  group_by(ID) %>%
  slice(1)

Other use which.min, since filter takes indices:

df %>% 
  group_by(ID) %>%
  slice(which.min(DATETIME_OF_PROCEDURE))

Be careful of ties in your data. Use filter if you want to retain all ties.

Both return the same:

Source: local data frame [2 x 2]
Groups: ID [2]

     ID DATETIME_OF_PROCEDURE
  <int>                <dttm>
1     1   2013-02-12 09:00:42
2     2   2014-01-22 02:00:42
Axeman
  • 32,068
  • 8
  • 81
  • 94