How to optimize sql query with subqueries, perhaps by lateral join?

Question

I'm trying to optimize sophisticated sql query, it would be executed on each map bonding box change. I thought that INNER LATERAL JOIN would be fastest but it isn't. Does anybody know how to speed up this query and how to make better use of LATERAL JOIN?

The fastest query I've made:

SELECT r0."id", r0."name" 
FROM "hiking"."routes" AS r0 
INNER JOIN "hiking"."hierarchy" AS h1 ON r0."id" = h1."parent" 
INNER JOIN (SELECT DISTINCT unnest(s0."rels") AS "rel" 
            FROM "hiking"."segments" AS s0 
            WHERE (ST_Intersects(s0."geom", ST_SetSrid(ST_MakeBox2D(ST_GeomFromText('POINT(1285982.015631 7217169.814674)', -1), ST_GeomFromText('POINT(2371999.313507 6454022.524275)', -1)), 3857)))) AS s2 ON TRUE 
WHERE (s2."rel" = h1."child");

Planning time: ~0.605 ms Execution time: ~37.232 ms

Actually the same as above but with LATERAL JOIN, is it correct that it is slower?

SELECT r0."id", r0."name" 
FROM "hiking"."routes" AS r0 
INNER JOIN "hiking"."hierarchy" AS h1 ON r0."id" = h1."parent" 
INNER JOIN LATERAL (SELECT DISTINCT unnest(s0."rels") AS "rel" 
                    FROM "hiking"."segments" AS s0 
                    WHERE (ST_Intersects(s0."geom", ST_SetSrid(ST_MakeBox2D(ST_GeomFromText('POINT(1285982.015631 7217169.814674)', -1), ST_GeomFromText('POINT(2371999.313507 6454022.524275)', -1)), 3857)))) AS s2 ON TRUE 
WHERE (s2."rel" = h1."child");

Planning time: ~1.353 ms Execution time: ~38.518 ms

Slowest query with subquery in subquery (that was my first so I've improved it a bit):

SELECT r0."id", r0."name" 
FROM "hiking"."routes" AS r0 
INNER JOIN (SELECT DISTINCT h0."parent" AS "parent" 
            FROM "hiking"."hierarchy" AS h0 
            INNER JOIN (SELECT DISTINCT unnest(s0."rels") AS "rel" 
                        FROM "hiking"."segments" AS s0 
                        WHERE (ST_Intersects(s0."geom", ST_SetSrid(ST_MakeBox2D(ST_GeomFromText('POINT(1285982.015631 7217169.814674)', -1), ST_GeomFromText('POINT(2371999.313507 6454022.524275)', -1)), 3857)))) AS s1 ON TRUE 
            WHERE (h0."child" = s1."rel")) AS s1 ON TRUE 
WHERE (r0."top" AND (r0."id" = s1."parent"));

Planning time: ~1.017 ms Execution time: ~41.288 ms

You're trying to optimize a query that completes in <40 ms? What do you want, light speed? — Evan Carroll, Jan 21 '18 at 00:40
We also don't care about the slower-than-best queries, what we do care about is the `EXPLAIN ANALYZE` — Evan Carroll, Jan 21 '18 at 00:41

Abelisto · Accepted Answer · 2018-01-21T01:07:16.533

It is hard to reproduce the logic of your query without any knowledges about your DB but I will try, so be patient:

SELECT r0."id", r0."name" 
FROM "hiking"."routes" AS r0 
INNER JOIN "hiking"."hierarchy" AS h1 ON r0."id" = h1."parent" 
WHERE 
  EXISTS (
    SELECT 1
    FROM "hiking"."segments" AS s0 
    WHERE (
      ST_Intersects(
        s0."geom",
        ST_SetSrid(ST_MakeBox2D(ST_GeomFromText('POINT(1285982.015631 7217169.814674)', -1), ST_GeomFromText('POINT(2371999.313507 6454022.524275)', -1)),
        3857)))
      AND array[h1."child"] <@ s0."rels");

There are two points:

Filtering data by EXISTS or NOT EXISTS sometimes faster then by joining
Instead of unnesting array field to compare its elements with some value you could to use the array comparison operator(s). Having appropriate GIN index it is much faster (docs here and here).

Here is simple example how to use indexes on arrays and how its faster:

create table foo(bar int[]);
insert into foo(bar) select array[1,2,3,x] from generate_series(1,1000000) as x;
create index idx on foo using gin (bar); // Note this
select * from foo where 666 in (select unnest(bar)); // 6936,345 ms on my HW
select * from foo where array[666] <@ bar; // 45,524 ms

Think this is any easier method `ST_Intersects( ST_SetSRID(ST_MakeEnvelope( 1285982.015631, 6454022.524275, 2371999.313507, 7217169.814674), 3857), s0.geom );` — Evan Carroll, Jan 21 '18 at 01:01
@EvanCarroll Shame to me! I never used PostGIS extension... :( — Abelisto, Jan 21 '18 at 01:09
Your query reduced time significantly `Planning time: ~0.438 ms Execution time: ~10.656 ms`, thanks a lot for all new knowledge! I've imported only small pice of world from osm, so in whole dataset it would perform even better. — luzny, Jan 21 '18 at 08:17

How to optimize sql query with subqueries, perhaps by lateral join?

1 Answers1