I have two tables that I would like to merge based on an exact match on one variable and a fuzzy match on another.
Consider the two tables below. For each id1 in dt1, I would like to find an id2 in dt2 that matches exactly on size and where the date value in dt2 is either equal to or later than the date field in dt1. If there are multiple matches, I would like to have one randomly selected.
dt1 <- data.table(c("A", "B"), c(2, 3), as.Date(c("2013-03-27", "2014-05-08"), format = '%Y-%m-%d'))
setnames(dt1, c("V1", "V2", "V3"),
c("id1", "size", "date"))
dt2 <- data.table(1:10, c(2, 4, 3, 2, 2, 2, 3, 2, 4, 4), as.Date(c("2014-02-25", "2011-08-02", "2014-06-21", "2013-11-29", "2012-02-21", "2011-12-02",
"2014-04-22", "2011-03-05", "2014-04-21", "2014-10-29"), format = '%Y-%m-%d'))
setnames(dt2, c("V1", "V2", "V3"),
c("id2", "size", "date"))
The resulting table could look like this:
id1 size date id2
1: A 2 2013-03-27 1
2: B 3 2014-05-08 3
or like this (depending on the random selection)
id1 size date id2
1: A 2 2013-03-27 4
2: B 3 2014-05-08 3