How to calculate the nearest neighbor distance for 10000 points in a table

Question

I am using PostgreSQL and I am using PostGIS extension.

I am able to compare one point with this query:

SELECT st_distance(geom, 'SRID=4326;POINT(12.601828337172 50.5173393068512)'::geometry) as d
FROM pointst1
ORDER BY d

but I want to compare not to one fixed point but to a column of points. And I want to do this with some sort of indexing so that it is computationally cheap and not 10000x10000 like a cross join within that table.

Create table:

create table pointst1
(
  id   integer not null
    constraint pointst1_id_pk
    primary key,
  geom geometry(Point, 4325)
);

create unique index pointst1_id_uindex
  on pointst1 (id);

create index geomidx
  on pointst1 (geom);

Edit: Refined query (comparing 10000 points with their nearest neighbor but getting the result of the point itself which is 0 and not the next nearest point:

select points.*,
  p1.id as p1_id,
  ST_Distance(geography(p1.geom), geography(points.geom)) as distance
from
  (select distinct on(p2.geom)*
  from pointst1 p2
  where p2.id is not null) as points
cross join lateral
  (select id, geom
  from pointst1
  order  by points.geom <-> geom
           limit 1) as p1;

I'm not sure what you're trying to get - some sample inputs and their corresponding output might be helpful? — Tanktalus, Jun 22 '18 at 16:08
Is there a reason for you to be using different SRS? (4326 and 4325). I believe your query is already doing what you want, just the index isn't the proper one... I will add it in an answer. — Jim Jones, Jun 22 '18 at 16:08
@Gentsview I also thought so. In my answer I considered all geometries as `4326` — Jim Jones, Jun 23 '18 at 08:14

Jim Jones · Answer 1 · 2018-06-23T08:38:10.620

Your query is already calculating the distance from the given geometry to all records in the table pointst1.

Considering these values ..

INSERT INTO pointst1 VALUES (1,'SRID=4326;POINT(16.19 48.21)'),
                            (2,'SRID=4326;POINT(18.96 47.50)'),
                            (3,'SRID=4326;POINT(13.47 52.52)'),
                            (4,'SRID=4326;POINT(-3.70 40.39)');

... if you run your query, it will already calculate the distance from all points in the table:

SELECT ST_Distance(geom, 'SRID=4326;POINT(12.6018 50.5173)'::geometry) as d
FROM pointst1
ORDER BY d

        d         
------------------
  2.1827914536208
 4.26600662563949
 7.03781262396208
 19.1914274750473
(4 Zeilen)

Change your index to GIST, which is the most suitable for geometry data:

create index geomidx on pointst1 using GIST (geom);

Just note that an index won't speed up this query of yours, since you're doing a full scan. But as soon as you start playing more in the where clause, you might see some improvement.

EDIT:

WITH j AS (SELECT id AS id2, geom AS geom2 FROM pointst1) 
SELECT id,j.id2,ST_Distance(geom, j.geom2) AS d
FROM pointst1,j
WHERE id <> j.id2
ORDER BY id,id2  

 id | id2 |        d         
----+-----+------------------
  1 |   2 | 2.85954541841881
  1 |   3 |  5.0965184194703
  1 |   4 | 21.3720495039666
  2 |   1 | 2.85954541841881
  2 |   3 | 7.43911957156222
  2 |   4 | 23.7492673571207
  3 |   1 |  5.0965184194703
  3 |   2 | 7.43911957156222
  3 |   4 | 21.0225069865609
  4 |   1 | 21.3720495039666
  4 |   2 | 23.7492673571207
  4 |   3 | 21.0225069865609
(12 rows)

Removing duplicate distances:

SELECT DISTINCT ON(d) * FROM (
WITH j AS (SELECT id AS id2, geom AS geom2 FROM pointst1) 
SELECT id,j.id2,ST_Distance(geom, j.geom2) AS d
FROM pointst1,j
WHERE id <> j.id2
ORDER BY id,id2) AS j

 id | id2 |        d         
----+-----+------------------
  1 |   2 | 2.85954541841881
  3 |   1 |  5.0965184194703
  3 |   2 | 7.43911957156222
  4 |   3 | 21.0225069865609
  4 |   1 | 21.3720495039666
  2 |   4 | 23.7492673571207
(6 rows)

yes my query is calculating the distance from the given geometry which is one point to all records in the table but I need it to calculate the distance between all the points within that table. — luftgekuhltlover, Jun 23 '18 at 08:18
you mean, you need the distance from every point in this table to all other records? e.g. a-b, a-c, a-d, a-e,...a-z — Jim Jones, Jun 23 '18 at 08:21
yes exactly, but since the distance is the same as a-b and b-a we can skip b-a. — luftgekuhltlover, Jun 23 '18 at 08:27
@Gentsview I just added another query to my answer. Is that what you mean? It still repeats inverted pairs.. but it can also be improved. Just to make sure it is what you want to achieve — Jim Jones, Jun 23 '18 at 08:29
this looks good but I need to find only the 1-NN for each of these 10,000 points: say we compare id:1 with all of other points and the output to be only the smallest distance, the end result would be again 10,000 rows. — luftgekuhltlover, Jun 23 '18 at 09:22
I came up with the query I need for these 10000 points but now the problem is that it takes the distance with the point itself which is 0 and I need the next nearest point. I will add the query in my initial post. — luftgekuhltlover, Jun 23 '18 at 16:18

How to calculate the nearest neighbor distance for 10000 points in a table

1 Answers1