I have two data frames in R and I would like to conditionally merge them on id
and day
. The merge is that the right
variables merged to the left
variables be as new/fresh/recent as possible, but must be at least three days old.
But, if there isn't a match in right
to my id
-date
pair in left
I'd still like to retain them. My study has two parts, so I don't want to drop the id
-day
observations just because they're not complete.
Can I do this in one sqldf
step? My current approach requires an additional base R merge
.
left <- data.frame(id=rep(1:5, each=10),
day=rep(1:10, times=5),
x=rnorm(5*10))
right <- data.frame(id=rep(1:2, each=21),
day=rep(-10:10, times=2),
y=rnorm(2*21))
combined <- sqldf("SELECT L.id, L.day, L.x, R.y
FROM left L LEFT OUTER JOIN right R
ON (L.id = R.id)
WHERE ((L.day - R.day) >= 3)
GROUP BY L.id, L.day
HAVING (R.day = MAX(R.day))")
combined
combined.2 <- merge(left, combined, all=TRUE)
combined.2