Return a list of IDs in which geometry connect to each other with SQL Amazon Redshift

Question

I have a list of geometry IDs which contains the Linestrings. I want to create a query that returns the object IDs as a LISTAGG from those that connect to each other.

Here an example of a geometry I have with their IDs:

I want to input all IDs into SQL and return all geometry that are connected. So I'm expecting something like this:

|                         IDs                               |
|:----------------------------------------------------------|
|0001, 0002, 0003, 0004, 0005, 0006, 0007                   |
|0008, 0009, 0010, 0011, 0012, 0013, 0014, 0015, 0016, 0017, 0018, 0019|

I have a table with a set of IDs and their respective Linestrings as this:

| ID | Vector                            |
|:---|:----------------------------------|
|0001|Linestring(1 2, 2 3, 3 4, 4 5, 5 6)|
|0002|Linestring(6 7, 8 9)               |
|0003|Linestring(9 10, 11 12, 13 14)     |
|0004|Linestring(14 15, 16 17)           |
|0005|Linestring(17 18, 18 19, 19 20)    |

And so on. I'm pretty new with Amazon Redshift and I'm just struggling to find a way to do this. I tried using ST_Buffer but then I'm stuck. Since the geometry connects to each other, maybe something that can bring all connections. I already got the A-B point of each geometry, but now I need to find a way to get the whole set of links.

This is what I currently have:

CREATE TEMP TABLE geoList
(ID BIGINT,
AB_Coords VARCHAR);

INSERT INTO geoList
SELECT ID, 
CONCAT(SPLIT_PART(vector,',',1),')') AS startEndPoint
FROM geometry
WHERE ID IN (0001, 0002, 0003, 0004, 0005, 0006, 0007, 0008, 0009, 0010, 0011, 0012, 0013, 0014, 0015, 0016, 0017, 0018, 0019);

INSERT INTO geoList
SELECT ID, 
CONCAT('LINESTRING (',TRIM(SPLIT_PART(vector,',',LEN(vector)-LEN(REPLACE(vector,',',''))+1))) AS startEndPoint
FROM geometry
WHERE ID IN (0001, 0002, 0003, 0004, 0005, 0006, 0007, 0008, 0009, 0010, 0011, 0012, 0013, 0014, 0015, 0016, 0017, 0018, 0019);

SELECT *, g1.ID = g2.ID AS sameGeo FROM geoList g1
LEFT JOIN geoList g2
ON g1.AB_Coords = g2.AB_Coords

I'm stuck here... Thanks!

score 1 · Answer 1 · answered Feb 04 '23 at 03:58

The algorithm you need is clustering, specifically DBSCAN. With systems that inherently support it, you just call DBSCAN, and then aggregate by cluster ID it returns to get the list of groups.

E.g. with BigQuery, similar query can be done in PostgreSQL:

with data as (
  select 1 id, 'Linestring(1 2, 2 3, 3 4, 4 5, 5 6)' line union all
  select 2, 'Linestring(1 3, 4 1)' union all
  select 6, 'Linestring(25 10, 30 15)' union all
  select 7, 'Linestring(25 15, 30 10)'
)
select array_agg(id)
from (
  select id, st_clusterdbscan(st_geogfromtext(line), 0, 1) over() as cluster_id 
  from data
) group by cluster_id

Result:

[1,2]
[6,7]

It looks like Redshift does not support DBSCAN though. The workaround would be to use external code - there are implementations for most languages. Another option is to do it natively in SQL using Recursive CTE, e.g. see discussion about implementing DBSCAN this way in this paper: https://db.in.tum.de/~schuele/data/ssdbm2022.pdf

Return a list of IDs in which geometry connect to each other with SQL Amazon Redshift

1 Answers1

Linked