1) Base R
Let yrs
be all unique years in the data and targets
be each of those years with the target's month and day. Then create dates
which contains all dates within delta
days of any value in targets
. Note that sapply
strips dates
of its "Date"
class but that does not matter since it is only subsequently used in %in%
and that ignores the class. Finally subset DF
down to those rows whose DATE
is in dates
. No packages are used.
# inputs (also DF defined in Note at end)
target <- "06-19"
delta <- 5
DATE <- as.Date(DF$DATE)
yrs <- unique(format(DATE, "%Y"))
targets <- as.Date(paste(yrs, target, sep = "-"))
dates <- c(sapply(targets, "+", seq(-delta, delta)))
DF[DATE %in% dates, ]
giving:
DATE A B C
5 1996-06-14 12:00:00 200.3 35.9 1.5
6 1996-06-15 12:00:00 138.9 15.1 0.0
2) sqldf
Alternately, this can be done using a single SQL statement. Note that we assume that the DATE
column is character since the question referred to it being in a particular format. Now, using the same inputs the inner select generates target dates from each year and then the outer select joins DF
to those rows within delta
days of any target date. We use the H2 database backend here since it has better date support than SQLite.
library(sqldf)
library(RH2)
# inputs (also DF defined in Note at end)
target <- "06-19"
delta <- 5
fn$sqldf("select DF.* from DF
join (select distinct cast(substr(DATE, 1, 4) || '-' || '$target' as DATE) as target
from DF)
on cast(substr(DATE, 1, 10) as DATE) between target - $delta and target + $delta")
giving:
DATE A B C
1 1996-06-14 12:00:00 200.3 35.9 1.5
2 1996-06-15 12:00:00 138.9 15.1 0.0
We could simplify the SQL somewhat if DATE
is of R's "Date"
class. That is, replace the sqldf
statement above with:
DF2 <- transform(DF, DATE = as.Date(DATE))
fn$sqldf("select DF2.* from DF2
join (select distinct cast(year(DATE) || '-' || '$target' as DATE) as target from DF2)
on DATE between target - $delta and target + $delta")
giving:
DATE A B C
1 1996-06-14 200.3 35.9 1.5
2 1996-06-15 138.9 15.1 0.0
Note
The input DF
is assumed to be:
DF <- structure(list(DATE = c("1996-06-10 12:00:00", "1996-06-11 12:00:00",
"1996-06-12 12:00:00", "1996-06-13 12:00:00", "1996-06-14 12:00:00",
"1996-06-15 12:00:00"), A = c(178, 184.1, 187.2, 194.4, 200.3,
138.9), B = c(24.1, 30.2, 29.4, 35, 35.9, 15.1), C = c(1.7, 1.1,
1.8, 5.3, 1.5, 0)), .Names = c("DATE", "A", "B", "C"), row.names = c(NA,
-6L), class = "data.frame")