Counting the Occurrences of Regular Time Spans in a Set of Time Intervals R

Question

Given a data set like below. I would like to count how many times a particular hour of the day (00:00, 01:00, ...., 22:00, 23:00) falls completely within any of the given intervals.

The date of occurrence doesn't matter. Just the overall count.

### This code is to create a data set similar to the one I am using. 

### This is a function I found on here to generate random times
latemail <- function(N, st="2012/01/01", et="2012/12/31") {
  st <- as.POSIXct(as.Date(st))
   et <- as.POSIXct(as.Date(et))
   dt <- as.numeric(difftime(et,st,unit="sec"))
   ev <- sort(runif(N, 0, dt))
   rt <- st + ev
}

set.seed(123)
startTimes <- latemail(5)
  endTimes <- startTimes +18000

my_data <- data.frame(startTimes, endTimes)

> my_data
                start                 end
1 2012-04-14 16:10:44 2012-04-14 21:10:44
2 2012-05-28 23:38:16 2012-05-29 04:38:16
3 2012-10-14 10:33:10 2012-10-14 15:33:10
4 2012-11-17 23:13:56 2012-11-18 04:13:56
5 2012-12-08 22:29:36 2012-12-09 03:29:36

So that hopefully helps give you an idea of what I am working with.

Ideally the output would be a dataset with one variable for the hour, and another for the count of occurrences. Like this

   hour count
1 00:00     3
2 01:00     3
3   etc     ?

How to doing this in different increments (say 15 minutes) would also be great to know.

Thank you!

It seems unclear to me how you would count "particular hour". You have intervals. For instance, if start time is 16:10:44 and ending time is 21:10:44, do you want to count 16 (+1), 17 (+1), 18 (+1), 19 (+1), 20 (+1) and 21 (+1)? Could you clarify your criteria? — jazzurro, Feb 03 '15 at 01:03
By "falls completely within" I was trying to get across the idea that an hour is only counted if it is entirely contained within the interval. Your example would not count 16:00 or 21:00. Apologies if that was unclear. — Michael, Feb 03 '15 at 01:07

jazzurro · Answer 1 · 2015-02-03T10:14:35.877

Here is my attempt. I am sure there are better ways of doing this. Given the comments above, I did the following. First, I took hour using ifelse. As you described in your commented, I rounded up/down hour here. Using transmute, I want to get a string including hours. In some cases, start hour can be larger than ending hour (in this case the record crosses dates). In order to deal with that, I used setdiff(), c(), and toString(). Using separate I separated hours into columns. I wanted to use cSplit() from the splitstackshape package, but I had an error message coming back. Hence, I chose separate() here. Once I had all hours separated, I reshaped the data using gather() and finally counted hour with count(). filter() was employed to remove NA cases. I hope this will help you to some extent.

** Data ** 

structure(list(startTimes = structure(c(1328621832.79254, 1339672345.94964, 
1343434566.9641, 1346743867.55964, 1355550696.37895), class = c("POSIXct", 
"POSIXt")), endTimes = structure(c(1328639832.79254, 1339690345.94964, 
1343452566.9641, 1346761867.55964, 1355568696.37895), class = c("POSIXct", 
"POSIXt"))), .Names = c("startTimes", "endTimes"), row.names = c(NA, 
-5L), class = "data.frame")

#           startTimes            endTimes
#1 2012-02-07 22:37:12 2012-02-08 03:37:12
#2 2012-06-14 20:12:25 2012-06-15 01:12:25
#3 2012-07-28 09:16:06 2012-07-28 14:16:06
#4 2012-09-04 16:31:07 2012-09-04 21:31:07
#5 2012-12-15 14:51:36 2012-12-15 19:51:36

library(dplyr)
library(tidyr)

mutate(my_data, start = ifelse(as.numeric(format(startTimes, "%M")) >= 0 & as.numeric(format(startTimes, "%S")) > 0,
                               as.numeric(format(startTimes, "%H")) + 1,
                               as.numeric(format(startTimes, "%H"))),
                end = ifelse(as.numeric(format(endTimes, "%M")) >= 0 & as.numeric(format(endTimes, "%S")) > 0,
                             as.numeric(format(endTimes, "%H")) - 1,
                             as.numeric(format(endTimes, "%H"))),
                start = replace(start, which(start == "24"), 0),
                end = replace(end, which(end == "-1"), 23)) %>%
rowwise() %>%
transmute(hour = ifelse(start < end, toString(seq.int(start, end, by = 1)),
                        toString(c(setdiff(seq(0, 23, by = 1), seq.int(end, start, by = 1)),
                                    start, end)))) %>%
separate(hour, paste("hour", 1:24, sep = "."), ", ", extra = "merge") %>%
gather(foo, hour) %>%
count(hour) %>%
filter(complete.cases(hour))

#   hour n
#1     0 2
#2     1 1
#3    10 1
#4    11 1
#5    12 1
#6    13 1
#7    15 1
#8    16 1
#9    17 2
#10   18 2
#11   19 1
#12    2 1
#13   20 1
#14   21 1
#15   22 1
#16   23 2

Thank you for your answer. It would be lovely if your code was a working example. "my_data" isn't actually created and I had to install and load the "stringi" package. I keep getting a "Error: incompatible types, expecting a character vector" when I try to run it on my own data. — Michael, Feb 09 '15 at 23:48
@Michael Thank you for your reply. Could you clarify what you mean by saying `my_data` isn't actually created? I used `dput()` to provide the data I got, which is the chunk beginning with "structure.." in my answer. You need to do `my_data <- structure.....` Have you done that? I did not use the `stringi` package in the code, by the way. Let me know if you need more help. — jazzurro, Feb 10 '15 at 00:36
you are right, adding my_data <- structure..... makes your example run. for some reason, when I try to apply your code to my data (which is using two POSIXct exactly like in my_data), the transmute() function gives me an error message "Error: incompatible types, expecting a numeric vector". Not sure if it is related to this https://github.com/hadley/dplyr/issues/464 or not. thanks for replying to my reply! — Michael, Feb 10 '15 at 15:30
yeah, as far as "stringi" goes I swear I had to download it. but maybe I am just crazy because I didn't need to load it to make your code run when I tried it today. — Michael, Feb 10 '15 at 15:33
@Michael I cannot reproduce your error with the code above. I wonder if you have the latest R, dplyr, and tidyr. Could you make sure you have the latest version and see what happens? — jazzurro, Feb 10 '15 at 22:43
so I realized that for my data, I didn't want zeros for any hours that weren't included.I used your idea of making a sequence but the heart of my solution looked like for (i in 1:nrow(weRtemp)) list_hours[[i]] <- seq(weRtemp$roundedStart[i], weRtemp$roundedEnd[i], by = 3600) (No idea how to format that better). Then I could just count the hours that made it into the sequences. Thank you very much for your help! — Michael, Feb 11 '15 at 00:21
@Michael It seems that you found what you want from your data and your own solution. I am glad to hear that. :) — jazzurro, Feb 11 '15 at 23:45

Counting the Occurrences of Regular Time Spans in a Set of Time Intervals R

1 Answers1