33

I have this data: (complete for December)

      date     sessions
1   2014-12-01  1932
2   2014-12-02  1828
3   2014-12-03  2349
4   2014-12-04  8192
5   2014-12-05  3188
6   2014-12-06  3277

And a need to subset/filter this, for example from "2014-12-05" to "2014-12-25"

I know that you can create a sequence with the operator ":".

Example: b <- c(1:5)

But How to filter a sequence? I tried this

NewDate <- filter(Dates, date("2014-12-05":"2014-12-12"))

But says:

Error: unexpected symbol in: "NewDate <- filter(Dates, date("2014-12-05":"2014-12-12") NewDate"

MLavoie
  • 9,671
  • 41
  • 36
  • 56
Omar Gonzales
  • 3,806
  • 10
  • 56
  • 120
  • Does this answer your question? [Subset a dataframe between 2 dates](https://stackoverflow.com/questions/23622338/subset-a-dataframe-between-2-dates) – camille Sep 14 '21 at 19:06

4 Answers4

38

you could use subset

Generating your sample data:

temp<-
read.table(text="date     sessions
2014-12-01  1932
2014-12-02  1828
2014-12-03  2349
2014-12-04  8192
2014-12-05  3188
2014-12-06  3277", header=T)

Making sure it's in date format:

temp$date <- as.Date(temp$date, format= "%Y-%m-%d")

temp



 #        date sessions
 # 1 2014-12-01     1932
 # 2 2014-12-02     1828
 # 3 2014-12-03     2349
 # 4 2014-12-04     8192
 # 5 2014-12-05     3188
 # 6 2014-12-06     3277

Using subset :

subset(temp, date> "2014-12-03" & date < "2014-12-05")

which gives:

  #        date sessions
  # 4 2014-12-04     8192

you could also use []:

temp[(temp$date> "2014-12-03" & temp$date < "2014-12-05"),]
jalapic
  • 13,792
  • 8
  • 57
  • 87
30

If you want to use dplyr, you can try something like this.

mydf <- structure(list(date = structure(c(16405, 16406, 16407, 16408, 
16409, 16410), class = "Date"), sessions = c(1932L, 1828L, 2349L, 
8192L, 3188L, 3277L)), .Names = c("date", "sessions"), row.names = c("1", 
"2", "3", "4", "5", "6"), class = "data.frame")

# Create date object
mydf$date <- as.Date(mydf$date) 

filter(mydf, between(date, as.Date("2014-12-02"), as.Date("2014-12-05")))

#If you avoid using `between()`, the code is simpler.

filter(mydf, date >= "2014-12-02", date <= "2014-12-05")
filter(mydf, date >= "2014-12-02" & date <= "2014-12-05")

#        date sessions
#1 2014-12-02     1828
#2 2014-12-03     2349
#3 2014-12-04     8192
#4 2014-12-05     3188
jazzurro
  • 23,179
  • 35
  • 66
  • 76
  • 2
    I thought the logical condition would be `&`, but looks like the `,` works. It is something new for me. Thanks. – akrun Feb 05 '15 at 04:11
  • 2
    @akrun It seems that both are fine here, doesn't it? I saw both versions in Hadley's dplyr tutorial pdf from UseR! 2014. I will post both versions. I am a bit puzzled by the behaviour of `between`. I wonder why one needs to use `as.Date` again. – jazzurro Feb 05 '15 at 04:15
  • 2
    Same reason `seq.Date(as.Date(x1),as.Date(x2),by="years")` requires it - you need to work on a `Date` object so the data matches. – thelatemail Feb 05 '15 at 05:06
  • 1
    @thelatemail That example cleared up my mind. Thank you very much. :) – jazzurro Feb 05 '15 at 05:07
  • 1
    @akrun Pleasure as always. It seems that `between` in `data.table` saves some typing. – jazzurro Feb 05 '15 at 05:15
12

An option using data.table

 library(data.table)
 setDT(df)[date %between% c('2014-12-02', '2014-12-05')]
 #         date sessions
 #1: 2014-12-02     1828
 #2: 2014-12-03     2349
 #3: 2014-12-04     8192
 #4: 2014-12-05     3188

This should work even if the "date" is "character" column

 df$date <- as.character(df$date)
 setDT(df)[date %between% c('2014-12-02', '2014-12-05')]
 #       date sessions
 #1: 2014-12-02     1828
 #2: 2014-12-03     2349
 #3: 2014-12-04     8192
 #4: 2014-12-05     3188

In case if we wanted to subset exclusive of the range

  setDT(df)[between(date, '2014-12-02', '2014-12-05', incbounds=FALSE)]
  #         date sessions
  #1:  2014-12-03     2349
  #2:  2014-12-04     8192

data

 df <-  structure(list(date = structure(c(16405, 16406, 16407, 16408, 
 16409, 16410), class = "Date"), sessions = c(1932L, 1828L, 2349L, 
 8192L, 3188L, 3277L)), .Names = c("date", "sessions"), row.names = c("1", 
 "2", "3", "4", "5", "6"), class = "data.frame")
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I've been looking for the fastest date filtering option to use inside `ggplot::geom_xxxx(data=DT[])`. Quick research suggests this would be the fastest option: does it matter in terms of performance to use %between% or between()? Is it using as.Date() in the internals? What I'm doing: 'ggplot(DT, aes(date, y_var)) + geom_step() + geom_step(data = DT[,DATE %in% ymd("2017-02-01"):ymd("2017-10-31"),] , aes(date, yvar), col='black', size=1.6 )' This is just to highlight the current (2017) year line: "black" and size bigger. – Dan Aug 18 '17 at 00:12
  • 2
    @Dan If you are using the `between` from `data.table` i.e. `between(x, lower, upper, incbounds=TRUE)# x %between% y` – akrun Aug 18 '17 at 03:32
  • Does %between% transform the date character to a true date? What's the difference between using data.table's %between% and using lubridate's %within%? – skan Feb 14 '21 at 17:39
0

With lubridate,

mydates <- interval(start = "2014-12-05", end = "2014-12-25")
NewDate <- Dates[which(date %within% mydates),]
dez93_2000
  • 1,730
  • 2
  • 23
  • 34
  • You don't need `which`. – hmhensen Dec 13 '19 at 23:42
  • 2
    Disagree. `Which` indexes the rows which fit the condition and returns a subset of those. Otherwise the subsetting index is a vector of TRUE/FALSE, but NA rows will be neither T nor F and thus return all-NA rows to the result. – dez93_2000 Dec 14 '19 at 03:25
  • You're right. Interesting. I thought subsetting would only return TRUEs instead of just not returning FALSEs (since NA is a logical too). Been using `dplyr` too much I guess. Using it in `filter` doesn't return the NA. Thanks. – hmhensen Dec 14 '19 at 03:56