1

I want to merge two data.tables into one. There are three columns in each one table (two coordinates x, y, and a Value assigned to the coordinates) and the other table has two columns (two coordinates x and y). However, the coordinates of the data.tables differ slightly so I want to merge the nearest coordinates. But then only the lowest nearest coordinates.

What I want to do something is very similar to Michiel (this thread) and the solution works for me if I want to have the lower or higher nearest. However, as I explained before I want to roll on lowest nearest, which in the function of data.table is roll = Inf.

Anybody an idea how I could rewrite the function proposed in the other thread or is there another function that can help me.

This is an example of data:

DT1 = data.frame(x=c(1,2,3,4,5,6,7,8,9,9,10),y=c(11,11,11,11,11,12,12,12,12,16,15))
DT2 = data.frame(x=c(1,1,1,6,6,6,10,10),y=c(11,12,13,11,12,13,14,16), name=c("A","B","C","D","E","F","G","H"))

Following the function of the previous thread you will get this:

x    y    name
1:  1 11    A
2:  2 11    A
3:  3 11    A
4:  4 11    D
5:  5 11    D
6:  6 12    E
7:  7 12    E
8:  8 12    E
9:  9 12    G
10:  9 16    G
11: 10 15    G

But what I want is this:

 x  y name
 1:  1 11    A
 2:  2 11    A
 3:  3 11    A
 4:  4 11    **A**  # Lowest nearest value of x:4 is 1 (A) not 6 (E)
 5:  5 11    **A**  # Lowest nearest value of x:5 is 1 (A) not 6 (E)
 6:  6 12    E
 7:  7 12    E
 8:  8 12    E
 9:  9 12    **E**  # Lowest nearest value of x:9 is 6, and of y:12 is 12 thus E
10:  9 16    **F**  # Lowest nearest value of x:9 is 6, and of y:16 is 15 thus F
11: 10 15    G

I hope this clarifies what I want. Thanks

AnoukStats
  • 11
  • 3

1 Answers1

1

Starting data.table v 1.9.8, non-equi joins were introduced. This makes solving such tasks relatively easy and straight-forward. You can basically join by these two coordinates only with the less/equal coordinates from the second data-frame, while picking only the latest match (the nearest one, in other words).

library(data.table) #v 1.10.4
setDT(DT2)[setDT(DT1), on = .(x <= x, y <= y), mult = "last"]
#      x  y name
#  1:  1 11    A
#  2:  2 11    A
#  3:  3 11    A
#  4:  4 11    A
#  5:  5 11    A
#  6:  6 12    E
#  7:  7 12    E
#  8:  8 12    E
#  9:  9 12    E
# 10:  9 16    F
# 11: 10 15    G
David Arenburg
  • 91,361
  • 17
  • 137
  • 196