find the N closest rows in another table without using lateral join

Question

I have two tables and I need to find the rows from the second one which are closest to the ones in the first table, measured by some easy-to-compute metric, e.g. the two closest rows measured by difference.

Starting with

create table t1 (id int, name text, x int);
create table t2 (id int, name text, x int);
insert into t1 values (1, 'foo', 10), (2, 'bar', 3);
insert into t2 values (1, 'a', 10), (2, 'b', 11), (1, 'c', 7);

I want to get to something like

name	name
foo	a
foo	b
bar	c
bar	a

since the values of rows a and b in t2 (10 and 11, respectively) are the two closest rows to the value of foo in t1.

Following this advice, I tried

select *
from t1
left join t2
on t2.id = (
  select id
  from t2
  where t2.id==t1.id
  order by abs(t2.x-t1.x) desc
  limit 2)

That does not work: I get the error no such column t1.x. (Mind you, this query works if I order by something more straightforward in the subquery, such as name, as it was suggested by the original poster).

I am guessing I could use lateral join in PostgreSQL for this (as it was suggested here), but for now it seems I am constrained to sqlite.

What are the volumes (number of rows) of your two tables t1 and t2? — lemon, Jul 08 '22 at 12:39

lemon · Answer 1 · 2022-07-08T13:01:08.137

One option is to compute the cartesian product of your two tables, then extract the first two matching rows for each "t1.x" value that have the least difference of values with respect to "t2.x":

the cartesian product is obtained using 1=1 inside the ON condition (which is always true for each combination of t1 row and t2 row)
getting the first matching rows is obtained by ordering on your ranking value (computed with ROW_NUMBER) and limiting the number of rows to the amount of t1 rows multiplied by 2 (the first two t2 rows for each t1 row).

SELECT t1.name AS namet1,
       t2.name AS namet2
FROM       t1
INNER JOIN t2 
        ON 1=1 
ORDER BY ROW_NUMBER() OVER(PARTITION BY t1.id ORDER BY ABS(t1.x-t2.x))
LIMIT (SELECT COUNT(*) FROM t1) *2

Check the demo here.

Note: this solution is somewhat expensive given that it involves a cartesian product, yet there may be no other solution to check for the minimum value of a table with respect to another table other than checking the difference between the each value of the first table for each value of the second table.

find the N closest rows in another table without using lateral join

1 Answers1