8

I am struggling with the following example

time = c('2013-01-03 21:59:21.549', '2013-01-04 22:00:21.549', '2013-01-05 22:01:21.222', '2013-01-06 22:06:23.559' )
value = c(1,2,3,4)

data <- data_frame(time, value)
data <-data %>%  mutate(time = ymd_hms(time))

> data
# A tibble: 4 × 2
                 time value
               <dttm> <dbl>
1 2013-01-03 21:59:21     1
2 2013-01-04 22:00:21     2
3 2013-01-05 22:01:21     3
4 2013-01-06 22:06:23     4

How can I write a dplyr::filter statement than only keeps observations between 21:59 and 22:01 (included) every day?

Playing separately with hour(time) and minute(time) does not seem to work very well here.

Am I missing something here?

Output expected: row 1,2 and 3 only. Thanks!

ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235

5 Answers5

6

2019 is here! Here is a better (and simpler) solution using as.hms. The tz argument is mandatory.

    time_str = c('2013-01-03 21:59:21.549', '2013-01-04 22:00:21.549', '2013-01-05 
    22:01:21.222', '2013-01-06 22:06:23.559' )
    value = c(1,2,3,4)
    data <- tibble(time_str, value)

    data %>%  mutate(timestamp_utc = ymd_hms(time_str, tz = 'UTC'),
                     timestamp_est = with_tz(timestamp_utc, 'America/New_York'),
                            time_est = as.hms(timestamp_est, tz = 'America/New_York')) %>% 
      filter(time_est >= hms::as.hms('16:59:00', tz = 'America/New_York'),
             time_est <= hms::as.hms('17:01:00', tz = 'America/New_York'))

will do the job

# A tibble: 2 x 5
  time_str                value timestamp_utc           timestamp_est           time_est 
  <chr>                   <dbl> <dttm>                  <dttm>                  <time>   
1 2013-01-03 21:59:21.549     1 2013-01-03 21:59:21.549 2013-01-03 16:59:21.549 16:59.549
2 2013-01-04 22:00:21.549     2 2013-01-04 22:00:21.549 2013-01-04 17:00:21.549 17:00.549
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
  • 1
    This is great, after a lot of stumbling around I found your question and your answer and it works for the thing I need. Thank you. – damo Oct 02 '19 at 09:09
  • thanks bro. you can upvote both question and answer if useful to you :) – ℕʘʘḆḽḘ Oct 02 '19 at 12:00
  • as.hms() has been replaced by as_hms(), which is no longer generic and also does not have a tz argument. It also uses the time zone of the argument for conversion, not the current system's timezone. Change the timezone before converting if necessary, e.g. using lubridate::with_tz(). – KKW Oct 17 '20 at 16:17
4

You mentioned hour and minute from lubridate don't work but not sure I get why. Does this not work?

filter(data, 
    (hour(time) == 21 & minute(time) == 59) | 
    (hour(time) == 22 & minute(time) <= 1)
)

An alternative way is to get the number of seconds elapsed in the day and compare to the number of seconds corresponding to your desired limits. The number of seconds elapsed in the day is calculated as the total number of seconds modulo the seconds in a day: as.numeric(data$time) %% (60 * 60 * 24). The number of seconds elapsed until (the start of the minute) 21:59 is 60 * 60 * 21 + 60 * 59 and the number of seconds until (the end of the minute) 22:01 is 60 * 60 * 22 + 60 + 59. Combining everything,

filter(data, between(as.numeric(time) %% (60 * 60 * 24),
                     60 * 60 * 21 + 60 * 59,
                     60 * 60 * 22 + 60 + 59
             )
 )
konvas
  • 14,126
  • 2
  • 40
  • 46
  • hey wait a minute! lemme think :) – ℕʘʘḆḽḘ Jan 04 '17 at 14:08
  • ok yeah, what about filtering between `19:59` and `23:38` with your solution? :D – ℕʘʘḆḽḘ Jan 04 '17 at 14:09
  • 1
    In that case you'd have to change the filtering condition to something like `(hour(time) == 19 & minute(time) == 59) | between(hour(time), 20, 22) | (hour(time) == 23 & minute(time) <= 38))`. I agree it gets a bit ugly. I'll post another method using seconds comparison – konvas Jan 04 '17 at 14:19
2

Another idea would be the following. You can create a numeric vector using hour, minute, and second. You can extract them with format() and convert character to numeric. Then, you subset the data with the two numbers indicating the time range you want (i.e., 215900, 220100).

library(dplyr)

data %>%
mutate(foo = as.numeric(format(time, "%H%M%S"))) %>%
filter(between(foo, 215900, 220100)) %>%
select(-foo)

#                 time value
#               <dttm> <dbl>
#1 2013-01-03 21:59:21     1
#2 2013-01-04 22:00:21     2
jazzurro
  • 23,179
  • 35
  • 66
  • 76
  • sounds a good idea as long as, for any time, the numeric representation mirrors the string representation. In other words, `04:34:23` should be `43423`? – ℕʘʘḆḽḘ Jan 04 '17 at 13:53
  • 1
    @Noobie The first zero will be dropped when you convert character to numeric. So, that would not be an issue, I think. When your timestamps are in a.m. and p.m., my idea would not be a good choice. – jazzurro Jan 04 '17 at 13:58
1

I guess this solves your problem:

library(dplyr) 

result <- data %>%  
  mutate(time2 = format(time, format="%H:%M:%S")) %>%
  filter(time2 >= "21:59:00" & time2 < "22:02:00") %>%
  select(-time2)
Felix Grossmann
  • 1,224
  • 1
  • 11
  • 30
0

Another way of filtering time window can be attained by converting the timestamp to minutes or seconds (with time setup from 0000 - 2400), store it in a new variable and filter using the new variable.

Antex
  • 1,364
  • 4
  • 18
  • 35