Organize data with maximum and minimum values in r

Question

I have a table like this:

which is generated by following code:

id <- c("1","2","1","2","1","1")
status <- c("open","open","closed","closed","open","closed")
date <- c("11-10-2017 15:10","10-10-2017 12:10","12-10-2017 22:10","13-10-2017 06:30","13-10-2017 09:30","13-10-2017 10:30")
data <- data.frame(id,status,date)
hour <- data.frame(do.call('rbind', strsplit(as.character(data$date),' ',fixed=TRUE)))
hour <- hour[,2]
hour <- as.POSIXlt(hour, format = "%H:%M")

And what I want to achieve is to select the earliest open time and latest closing time for each id. So the final result will look like this:

Currently I use sqldf to solve the problem:

sqldf("select * from (select id, status, date as closeDate, max(hour) as hour from data 
  where status='closed'
   group by id,status) as a
   join 
   (select id, status, date  as openDate, min(hour) as hour from data 
   where status='open'
   group by id,status) as b
  using(id);")

Question1: Is there a simpler way to do so?

Question2: If I select max(hour) as any other name rather than hour，the result will not be in the format of date and time, but a series of number like 1507864200, 1507807800. How to keep the time format while assigning different names to the columns?

Do you mean for `hour` to be a column in your data? Maybe you forgot a `data$hour <- hour` line? — Gregor Thomas, Oct 12 '17 at 20:29

score 0 · Answer 1 · answered Oct 12 '17 at 20:35

Using package plyr:

(for some reason, as shown here, you have to convert the hour to class as.POSIXct, otherwise you get an error message):

#add hour to data.frame:
data$hour <- as.POSIXct(hour)
library(plyr)
ddply(data, .(id), summarize, open=min(hour[status=="open"]),
     closed=max(hour[status=="closed"]))

Organize data with maximum and minimum values in r

1 Answers1