0

Given a data set like below. I would like to count how many times a particular hour of the day (00:00, 01:00, ...., 22:00, 23:00) falls completely within any of the given intervals.

The date of occurrence doesn't matter. Just the overall count.

### This code is to create a data set similar to the one I am using. 

### This is a function I found on here to generate random times
latemail <- function(N, st="2012/01/01", et="2012/12/31") {
  st <- as.POSIXct(as.Date(st))
   et <- as.POSIXct(as.Date(et))
   dt <- as.numeric(difftime(et,st,unit="sec"))
   ev <- sort(runif(N, 0, dt))
   rt <- st + ev
}

set.seed(123)
startTimes <- latemail(5)
  endTimes <- startTimes +18000

my_data <- data.frame(startTimes, endTimes)

> my_data
                start                 end
1 2012-04-14 16:10:44 2012-04-14 21:10:44
2 2012-05-28 23:38:16 2012-05-29 04:38:16
3 2012-10-14 10:33:10 2012-10-14 15:33:10
4 2012-11-17 23:13:56 2012-11-18 04:13:56
5 2012-12-08 22:29:36 2012-12-09 03:29:36

So that hopefully helps give you an idea of what I am working with.

Ideally the output would be a dataset with one variable for the hour, and another for the count of occurrences. Like this

   hour count
1 00:00     3
2 01:00     3
3   etc     ?

How to doing this in different increments (say 15 minutes) would also be great to know.

Thank you!

Michael
  • 1,537
  • 6
  • 20
  • 42
  • It seems unclear to me how you would count "particular hour". You have intervals. For instance, if start time is 16:10:44 and ending time is 21:10:44, do you want to count 16 (+1), 17 (+1), 18 (+1), 19 (+1), 20 (+1) and 21 (+1)? Could you clarify your criteria? – jazzurro Feb 03 '15 at 01:03
  • By "falls completely within" I was trying to get across the idea that an hour is only counted if it is entirely contained within the interval. Your example would not count 16:00 or 21:00. Apologies if that was unclear. – Michael Feb 03 '15 at 01:07

1 Answers1

0

Here is my attempt. I am sure there are better ways of doing this. Given the comments above, I did the following. First, I took hour using ifelse. As you described in your commented, I rounded up/down hour here. Using transmute, I want to get a string including hours. In some cases, start hour can be larger than ending hour (in this case the record crosses dates). In order to deal with that, I used setdiff(), c(), and toString(). Using separate I separated hours into columns. I wanted to use cSplit() from the splitstackshape package, but I had an error message coming back. Hence, I chose separate() here. Once I had all hours separated, I reshaped the data using gather() and finally counted hour with count(). filter() was employed to remove NA cases. I hope this will help you to some extent.

** Data ** 

structure(list(startTimes = structure(c(1328621832.79254, 1339672345.94964, 
1343434566.9641, 1346743867.55964, 1355550696.37895), class = c("POSIXct", 
"POSIXt")), endTimes = structure(c(1328639832.79254, 1339690345.94964, 
1343452566.9641, 1346761867.55964, 1355568696.37895), class = c("POSIXct", 
"POSIXt"))), .Names = c("startTimes", "endTimes"), row.names = c(NA, 
-5L), class = "data.frame")

#           startTimes            endTimes
#1 2012-02-07 22:37:12 2012-02-08 03:37:12
#2 2012-06-14 20:12:25 2012-06-15 01:12:25
#3 2012-07-28 09:16:06 2012-07-28 14:16:06
#4 2012-09-04 16:31:07 2012-09-04 21:31:07
#5 2012-12-15 14:51:36 2012-12-15 19:51:36

library(dplyr)
library(tidyr)

mutate(my_data, start = ifelse(as.numeric(format(startTimes, "%M")) >= 0 & as.numeric(format(startTimes, "%S")) > 0,
                               as.numeric(format(startTimes, "%H")) + 1,
                               as.numeric(format(startTimes, "%H"))),
                end = ifelse(as.numeric(format(endTimes, "%M")) >= 0 & as.numeric(format(endTimes, "%S")) > 0,
                             as.numeric(format(endTimes, "%H")) - 1,
                             as.numeric(format(endTimes, "%H"))),
                start = replace(start, which(start == "24"), 0),
                end = replace(end, which(end == "-1"), 23)) %>%
rowwise() %>%
transmute(hour = ifelse(start < end, toString(seq.int(start, end, by = 1)),
                        toString(c(setdiff(seq(0, 23, by = 1), seq.int(end, start, by = 1)),
                                    start, end)))) %>%
separate(hour, paste("hour", 1:24, sep = "."), ", ", extra = "merge") %>%
gather(foo, hour) %>%
count(hour) %>%
filter(complete.cases(hour))

#   hour n
#1     0 2
#2     1 1
#3    10 1
#4    11 1
#5    12 1
#6    13 1
#7    15 1
#8    16 1
#9    17 2
#10   18 2
#11   19 1
#12    2 1
#13   20 1
#14   21 1
#15   22 1
#16   23 2
jazzurro
  • 23,179
  • 35
  • 66
  • 76
  • Thank you for your answer. It would be lovely if your code was a working example. "my_data" isn't actually created and I had to install and load the "stringi" package. I keep getting a "Error: incompatible types, expecting a character vector" when I try to run it on my own data. – Michael Feb 09 '15 at 23:48
  • @Michael Thank you for your reply. Could you clarify what you mean by saying `my_data` isn't actually created? I used `dput()` to provide the data I got, which is the chunk beginning with "structure.." in my answer. You need to do `my_data <- structure.....` Have you done that? I did not use the `stringi` package in the code, by the way. Let me know if you need more help. – jazzurro Feb 10 '15 at 00:36
  • you are right, adding my_data <- structure..... makes your example run. for some reason, when I try to apply your code to my data (which is using two POSIXct exactly like in my_data), the transmute() function gives me an error message "Error: incompatible types, expecting a numeric vector". Not sure if it is related to this https://github.com/hadley/dplyr/issues/464 or not. thanks for replying to my reply! – Michael Feb 10 '15 at 15:30
  • yeah, as far as "stringi" goes I swear I had to download it. but maybe I am just crazy because I didn't need to load it to make your code run when I tried it today. – Michael Feb 10 '15 at 15:33
  • @Michael I cannot reproduce your error with the code above. I wonder if you have the latest R, dplyr, and tidyr. Could you make sure you have the latest version and see what happens? – jazzurro Feb 10 '15 at 22:43
  • so I realized that for my data, I didn't want zeros for any hours that weren't included.I used your idea of making a sequence but the heart of my solution looked like for (i in 1:nrow(weRtemp)) list_hours[[i]] <- seq(weRtemp$roundedStart[i], weRtemp$roundedEnd[i], by = 3600) (No idea how to format that better). Then I could just count the hours that made it into the sequences. Thank you very much for your help! – Michael Feb 11 '15 at 00:21
  • @Michael It seems that you found what you want from your data and your own solution. I am glad to hear that. :) – jazzurro Feb 11 '15 at 23:45