I was wondering is there were a simple way to filter a dataset to only keep records for nearest intervals.
My data looks like this:
library(tidyverse)
library(lubridate)
df <- data.frame(country = rep(c("Spain","Portugal"), each = 4),
type = rep(c("1","2"), each = 4),
name = rep(c("A","B"), each = 4),
event_start = as.Date(c("2012-07-13", "2014-09-05", "2016-12-23", "2017-01-01", "2015-11-27", "2014-06-27", "2013-04-11", "2012-11-27")),
event_end = as.Date(c("2014-09-04", "2016-12-22", "2016-12-31", "2017-01-09", "2016-02-10", "2014-11-26", "2014-06-26", "2013-04-10")),
start = rep(as.Date(c("2008-10-01", "2017-01-01")), each = 4),
end = rep(as.Date(c("2008-12-31", "2017-12-31")), each = 4),
stringsAsFactors = FALSE) %>%
mutate(event_interval = interval(event_start, event_end),
int = interval(start, end))
Expected result:
country type name, event_start, event_end, start, end, event_interval, int
Spain 1 A 2012-07-13 2014-09-04 2008-10-01 2008-12-31 2012-07-13 UTC--2014-09-04 UTC 2008-10-01 UTC--2008-12-31 UTC
Portugal 2 B 2015-11-27 2016-02-10 2017-01-01 2017-12-31 2015-11-27 UTC--2016-02-10 UTC 2017-01-01 UTC--2017-12-31 UTC
In essence, I would like to keep — for each combination of country
/type
/name
— the row for which event_interval
is the nearest to int
.
I've tried (and kind of succeeded) with a not-so-good-looking for loop
but was wondering if you knew of something more straightforward with dplyr
?
Cheers
EDIT
To clarify, in the example above, event_int
and int
don't intersect, but this is not always true in my entire, proper dataset. In fact, for many combinations of country
/type
/name
, there may be several event_int
overlapping with int
, so I really need to find which event_int
is the most similar to int
, i.e. the event_int
that overlaps the most with int
OR is the nearest to int
.