2

I have a dataset that looks like this.

library(tidyverse)

data <- structure(list(Timestamp = c("12/7/2021 11:29:15", "12/7/2021 11:29:35", 
                                     "12/7/2021 11:59:18", "12/7/2021 12:28:53", 
                                     "12/7/2021 13:19:12"), 
                        ID = c(157767, 380925, 319956, 375889, 375889), 
                        Status = c("Yes", "No", "Yes", "No", "Yes")), 
                        row.names = c(NA, -5L), class = c("tbl_df",                                                                                             
                        "tbl", "data.frame"))

print(data)
> data
# A tibble: 5 x 3
  Timestamp              ID Status
  <chr>               <dbl> <chr> 
1 12/7/2021 11:29:15 157767 Yes   
2 12/7/2021 11:29:35 380925 No    
3 12/7/2021 11:59:18 319956 Yes   
4 12/7/2021 12:28:53 375889 No    
5 12/7/2021 13:19:12 375889 Yes  

Rows 4 & 5 have the same ID. Based off the TimeStamp, I can see that row 5 was done later and has and updated status.

Is it possible for me to filter based off timestamp to only show the most recent timestamps in my data frame? my Desired output would be the following

  Timestamp              ID Status
  <chr>               <dbl> <chr> 
1 12/7/2021 11:29:15 157767 Yes   
2 12/7/2021 11:29:35 380925 No    
3 12/7/2021 11:59:18 319956 Yes      
4 12/7/2021 13:19:12 375889 Yes 

Thank you.

RL_Pug
  • 697
  • 7
  • 30

2 Answers2

4

With dplyr::slice_max you could do:

Note: As your Timestamp is a character we first have to convert to a datetime to make this work (Thanks to @utubun for pointing that out in his comment)

library(dplyr)

data %>% 
  group_by(ID) %>% 
  slice_max(as.POSIXct(Timestamp, format = "%d/%m/%Y %H:%M:%S")) %>% 
  ungroup() %>% 
  arrange(Timestamp)
#> # A tibble: 4 × 3
#>   Timestamp              ID Status
#>   <chr>               <dbl> <chr> 
#> 1 12/7/2021 11:29:15 157767 Yes   
#> 2 12/7/2021 11:29:35 380925 No    
#> 3 12/7/2021 11:59:18 319956 Yes   
#> 4 12/7/2021 13:19:12 375889 Yes
stefan
  • 90,330
  • 6
  • 25
  • 51
  • It's not save without conversion to date, e.g. `max(c('12/7/2021 13:19:12', '12/10/2021 13:09:12'))` results to `"12/7/2021 13:19:12"`, because characters are ordered alphabetically. – utubun Dec 07 '21 at 19:39
  • 1
    @utubun. Thx. Haven't realized that Timestamp isn't a date time already. Will fix that. – stefan Dec 07 '21 at 19:45
1
  1. First transform Timestamp to dttm class.
  2. group and filter by last()
library(dplyr)
library(lubridate)
data %>% 
  mutate(Timestamp = dmy_hms(Timestamp)) %>% 
  group_by(ID) %>% 
  filter(Timestamp == last(Timestamp))
  Timestamp               ID Status
  <dttm>               <dbl> <chr> 
1 2021-07-12 11:29:15 157767 Yes   
2 2021-07-12 11:29:35 380925 No    
3 2021-07-12 11:59:18 319956 Yes   
4 2021-07-12 13:19:12 375889 Yes 
``´
TarJae
  • 72,363
  • 6
  • 19
  • 66