3

I hawe geospatial tables named node and ways.

I want to set end_node_id column of ways table with node table attribute using spatial join. two tables have about 100K data.

update ways
set
    end_node_id = n.node_id
from
    ways w
inner join
    nodes n
on
    st_endpoint(w.shape) = n.shape;

But this query takes so many times. After 15 minutes I stopped the query. Is there any performence query for this operation?

Update Explain:

Update on ways w (cost=0.00..669909619.43 rows=24567397 width=576)  
->  Nested Loop  (cost=0.00..669909619.43 rows=24567397 width=576)
          Join Filter: (st_endpoint(w.shape) = n.shape)
          ->  Seq Scan on ways w (cost=0.00..8960.61 rows=120161 width=564)
          ->  Materialize  (cost=0.00..12200.81 rows=204454 width=52)
                        ->  Seq Scan on nodes n  (cost=0.00..9181.54 rows=204454 width=52)
barteloma
  • 6,403
  • 14
  • 79
  • 173

1 Answers1

4

Don't include ways in the from clause! This doesn't do what you expect. Presumably, you want:

update ways w
    set end_node_id = n.node_id
from nodes n
where st_endpoint(w.shape) = n.shape;

In your formulation, the ways in the update is a different reference from the ways in the from. So, your code is creating Cartesian product -- which no doubt slows down the processing. Note that this is different from the behavior of SQL Server, which has similar syntax.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I tried this solution but it is executing for 1 hours, not complated yet. – barteloma Jan 12 '21 at 14:27
  • @barteloma . . . I suspect that `nodes` is pretty big. This is still doing a Cartesian product between `ways` and `nodes`. You may need some specific geo indexes if you need to speed this up. – Gordon Linoff Jan 12 '21 at 14:45
  • I have updated the post with explain. But I do not understand so much of explain. Both of tables have 100K data and they have spatial indexes. – barteloma Jan 12 '21 at 14:49
  • @barteloma . . . Every row in `ways` has to be compared to 100,000 rows in `nodes`. That takes a while, alas. – Gordon Linoff Jan 12 '21 at 21:59
  • Yer right, but if I set the date filter `where created_at::date=current_date` the query takes long time yet. – barteloma Jan 14 '21 at 13:13
  • Use `created_at >= current_date and created_at < current_date + interval '1 day'`. This allows the query to use an index on `created_date`, assuming you have one. – Gordon Linoff Jan 14 '21 at 13:34