1

I have a set of traffic data that has date and time columns, however, I'm having issues to properly subset the data according to the specific times. Is there a way to properly subset data based on date and time ranges? Using a filter or subset does not seem to work for me.

For e.g. I would like to extract data from 17/08/2019 to 19/08/2019 and for the following time periods: 06:00 to 07:00, 08:30 to 10:00, 12:00 to 13:00, 17:30 to 19:00, 19:00 to 20:00 and 20:00 to 22:00. I appreciate everyone's advice, please!

Vehicle.No. Date Time Payment.Amount
SXX0001A 17/08/2019 00:01 1.25
SXX0002A 17/08/2019 00:21 5
SXX0003A 17/08/2019 00:31 0
SXX0004A 17/08/2019 02:01 3
SXX0005A 17/08/2019 03:01 2
SXX0006A 17/08/2019 18:01 1.25
.
.
.
SXX0007A 18/08/2019 00:01 1.25
SXX0008A 18/08/2019 02:01 1.25
SXX0009A 18/08/2019 19:01 1.25
SXX0010A 18/08/2019 20:01 1.25
.
.
.
SXX0006A 20/08/2019 02:01 1.25
SXX0006A 20/08/2019 03:01 3.25
SXX0006A 20/08/2019 01:01 5.25
SXX0006A 20/08/2019 12:01 0
SXX0006A 20/08/2019 14:01 1.25
.
.
.

MLavoie
  • 9,671
  • 41
  • 36
  • 56
jason
  • 43
  • 6

1 Answers1

1

The first thing is to make sure that your Date and Time variables are in date and time formats respectively. It is impossible to tell, from what you are providing, whether this is the case or whether those variables are characters or factors.

Let's assume that they are characters:

df <- read.table(
  text = 
"Vehicle.No. Date Time Payment.Amount
SXX0001A 17/08/2019 00:01 1.25
SXX0002A 17/08/2019 00:21 5
SXX0003A 17/08/2019 00:31 0
SXX0004A 17/08/2019 02:01 3
SXX0005A 17/08/2019 03:01 2
SXX0006A 17/08/2019 18:01 1.25
SXX0007A 18/08/2019 00:01 1.25
SXX0008A 18/08/2019 02:01 1.25
SXX0009A 18/08/2019 19:01 1.25
SXX0010A 18/08/2019 20:01 1.25
SXX0006A 20/08/2019 02:01 1.25
SXX0006A 20/08/2019 03:01 3.25
SXX0006A 20/08/2019 01:01 5.25
SXX0006A 20/08/2019 12:01 0
SXX0006A 20/08/2019 14:01 1.25",
stringsAsFactors = F,
header = T
)

str(df$Date)
chr [1:15] "17/08/2019" "17/08/2019" "17/08/2019" "17/08/2019" ...

str(df$Time)
chr [1:15] "00:01" "00:21" "00:31" "02:01" "03:01" "18:01" "00:01" "02:01" ...

Let's create 2 new variables (date and datetime) in date and datetime formats. I am creating a datetime variable rather than a time one because this will come in handy later. The package readr has great functions to parse vectors.

library(dplyr)
library(readr)

df <-
  df %>%
  mutate(
    date = parse_date(Date, "%d/%m/%Y"),
    datetime = parse_datetime(paste(Date, Time), "%d/%m/%Y %H:%M")
  )

str(df$date)
Date[1:15], format: "2019-08-17" "2019-08-17" "2019-08-17" ...

str(df$datetime)
POSIXct[1:15], format: "2019-08-17 00:01:00" "2019-08-17 00:21:00" ...

It is not clear to me how you want your output (do you want to filter the data that fit in any of the times you list? or do you want to filter for each date and time period separately?). Let's assume that you want all of the data that fit in any of the date and time periods you list.

Since we need to filter for the same time periods for several days, we will use purrr to avoid code repetition:

  1. create a list of filtered data frames (each element corresponding to one of the days of interest)
  2. create a function that will filter data for all the time periods of interest for a certain day. This function uses the package lubridate.
  3. apply the function to each element of the list and output a data frame thanks to purrr:map_df() and remove the variables time and datetime we had created (though maybe you should keep them and get rid of your Date and Time variables instead).
library(purrr)
library(lubridate)

ls <- list(
  filter(df, date == "2019-08-17"),
  filter(df, date == "2019-08-18"),
  filter(df, date == "2019-08-19")
)


select_times <- function(df) {
  df %>%
    filter(
      datetime %within% interval(paste(unique(df$date), "06:00:00"),
                                 paste(unique(df$date), "07:00:00")) |
      datetime %within% interval(paste(unique(df$date), "08:30:00"),
                                 paste(unique(df$date), "10:00:00")) |
      datetime %within% interval(paste(unique(df$date), "12:00:00"),
                                 paste(unique(df$date), "13:00:00")) |
      datetime %within% interval(paste(unique(df$date), "17:30:00"),
                                 paste(unique(df$date), "22:00:00"))
    )
}

map_df(ls, select_times) %>%
  select(- date, - datetime)

Output:

  Vehicle.No.       Date  Time Payment.Amount
1    SXX0006A 17/08/2019 18:01           1.25
2    SXX0009A 18/08/2019 19:01           1.25
3    SXX0010A 18/08/2019 20:01           1.25

This is the subset of your data for the time periods of interest during the days of interest.

For alternative solutions, you might want to look at the package xts. This post could be useful.

prosoitos
  • 6,679
  • 5
  • 27
  • 41