We could use join_by()
dplyr >= 1.1.0
We now could do
Rolling joins
Rolling joins are a variant of inequality joins that limit the results returned from an inequality join condition. They are useful for "rolling" the closest match forward/backwards when there isn't an exact match. To construct a rolling join, wrap an inequality with closest()
.
closest(expr)
expr must be an inequality involving one of: >, >=, <, or <=
For example, closest(x >= y)
is interpreted as: For each value in x, find the closest value in y that is less than or equal to that x value.
closest()
will always use the left-hand table (x) as the primary table, and the right-hand table (y) as the one to find the closest match in, regardless of how the inequality is specified. For example, closest(y$a >= x$b) will always be interpreted as closest(x$b <= y$a).
See ?join_by()
library(dplyr) #>= 1.1.0
left_join(df1, df2, join_by(closest(num >= num)))
num.x Lat Long num.y dives underwate
1 2 30.89878 -88.78650 1 5 6.77
2 6 30.89760 -88.97650 4 9 5.99
3 8 30.89786 -88.67450 7 1 12.44
4 12 40.88764 -88.76450 9 22 23.20
5 15 30.98786 -88.87645 14 16 1.50
6 18 30.87650 -88.53726 16 11 2.40
7 22 30.62534 -88.66543 20 19 22.40
8 25 30.64536 -88.43434 24 7 18.60
9 29 30.98764 -88.76285 27 9 11.20
10 32 30.87465 -88.23488 31 23 197.70
data.table
library(data.table)
# Convert your dataframes to data.tables
dt1 <- as.data.table(df1)
dt2 <- as.data.table(df2)
# Set keys for rolling join
setkey(dt1, num)
setkey(dt2, num)
# Perform the rolling join
result <- dt2[dt1, roll = Inf]
# combine
cbind(dt2[,1, with = FALSE], result)
data:
df1 <- structure(list(num = c(2L, 6L, 8L, 12L, 15L, 18L, 22L, 25L, 29L,
32L), Lat = c(30.89878, 30.8976, 30.897865, 40.88764, 30.98786,
30.8765, 30.62534, 30.64536, 30.98764, 30.87465), Long = c(-88.7865,
-88.9765, -88.6745, -88.7645, -88.87645, -88.53726, -88.665433,
-88.434337, -88.76285, -88.23488)), class = "data.frame", row.names = c(NA,
-10L))
df2 <- structure(list(num = c(1L, 4L, 7L, 9L, 14L, 16L, 20L, 24L, 27L,
31L), dives = c(5L, 9L, 1L, 22L, 16L, 11L, 19L, 7L, 9L, 23L),
underwate = c(6.77, 5.99, 12.44, 23.2, 1.5, 2.4, 22.4, 18.6,
11.2, 197.7)), class = "data.frame", row.names = c(NA, -10L
))