Optimise the PG query

Question

The query is used very often in the app and is too expensive.

What are the things I can do to optimise it and bring the total time to milliseconds (rather than hundreds of ms)?

NOTES:

removing DISTINCT improves (down to ~460ms), but I need to to get rid of cartesian product :( (yeah, show better way of avoiding it)
removing OREDER BY name improves, but not significantly.

The query:

SELECT DISTINCT properties.*
FROM properties JOIN developments ON developments.id = properties.development_id

 -- Development allocations
 LEFT JOIN allocation_items   AS dev_items  ON dev_items.development_id = properties.development_id
 LEFT JOIN allocations        AS dev_allocs ON dev_items.allocation_id = dev_allocs.id

 -- Group allocations
 LEFT JOIN properties_property_groupings ppg  ON ppg.property_id = properties.id
 LEFT JOIN property_groupings pg              ON pg.id = ppg.property_grouping_id
 LEFT JOIN allocation_items prop_items        ON prop_items.property_grouping_id = pg.id
 LEFT JOIN allocations prop_allocs            ON prop_allocs.id = prop_items.allocation_id

WHERE
  (properties.status <> 'deleted') AND ((
    properties.status <> 'inactive'
    AND (
     (dev_allocs.receiving_company_id = 175 OR prop_allocs.receiving_company_id = 175)
     AND developments.status = 'active'
    )
    OR developments.company_id = 175
   )
   AND EXISTS (
     SELECT 1 FROM development_participations dp
     JOIN participations p ON p.id = dp.participation_id
     WHERE dp.allowed
       AND p.user_id = 387 AND p.company_id = 175
       AND dp.development_id = properties.development_id
     LIMIT 1
   )
)
ORDER BY properties.name

EXPLAIN ANALYZE

 Unique  (cost=72336.86..72517.53 rows=1606 width=4336) (actual time=703.766..710.920 rows=219 loops=1)
   ->  Sort  (cost=72336.86..72340.87 rows=1606 width=4336) (actual time=703.765..704.698 rows=5091 loops=1)
         Sort Key: properties.name, properties.id, properties.status, properties.level, etc etc (all columns)
         Sort Method: external sort  Disk: 1000kB
         ->  Nested Loop Left Join  (cost=0.00..69258.84 rows=1606 width=4336) (actual time=25.230..366.489 rows=5091 loops=1)
               Filter: ((((properties.status)::text <> 'inactive'::text) AND ((dev_allocs.receiving_company_id = 175) OR (prop_allocs.receiving_company_id = 175)) AND ((developments.status)::text = 'active'::text)) OR (developments.company_id = 175))
               ->  Nested Loop Left Join  (cost=0.00..57036.99 rows=41718 width=4355) (actual time=25.122..247.587 rows=99567 loops=1)
                     ->  Nested Loop Left Join  (cost=0.00..47616.39 rows=21766 width=4355) (actual time=25.111..163.827 rows=39774 loops=1)
                           ->  Nested Loop Left Join  (cost=0.00..41508.16 rows=21766 width=4355) (actual time=25.101..112.452 rows=39774 loops=1)
                                 ->  Nested Loop Left Join  (cost=0.00..34725.22 rows=21766 width=4351) (actual time=25.087..68.104 rows=19887 loops=1)
                                       ->  Nested Loop Left Join  (cost=0.00..28613.00 rows=21766 width=4351) (actual time=25.076..39.360 rows=19887 loops=1)
                                             ->  Nested Loop  (cost=0.00..27478.54 rows=1147 width=4347) (actual time=25.059..29.966 rows=259 loops=1)
                                                   ->  Index Scan using developments_pkey on developments  (cost=0.00..25.17 rows=49 width=15) (actual time=0.048..0.127 rows=48 loops=1)
                                                         Filter: (((status)::text = 'active'::text) OR (company_id = 175))
                                                   ->  Index Scan using index_properties_on_development_id on properties  (cost=0.00..559.95 rows=26 width=4336) (actual time=0.534..0.618 rows=5 loops=48)
                                                         Index Cond: (development_id = developments.id)
                                                         Filter: (((status)::text <> 'deleted'::text) AND (SubPlan 1))
                                                         SubPlan 1
                                                           ->  Limit  (cost=0.00..10.00 rows=1 width=0) (actual time=0.011..0.011 rows=0 loops=2420)
                                                                 ->  Nested Loop  (cost=0.00..10.00 rows=1 width=0) (actual time=0.011..0.011 rows=0 loops=2420)
                                                                       Join Filter: (dp.participation_id = p.id)
                                                                       ->  Seq Scan on development_participations dp  (cost=0.00..1.71 rows=1 width=4) (actual time=0.004..0.008 rows=1 loops=2420)
                                                                             Filter: (allowed AND (development_id = properties.development_id))
                                                                       ->  Index Scan using index_participations_on_user_id on participations p  (cost=0.00..8.27 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=3148)
                                                                             Index Cond: (user_id = 387)
                                                                             Filter: (company_id = 175)
                                             ->  Index Scan using index_allocation_items_on_development_id on allocation_items dev_items  (cost=0.00..0.70 rows=23 width=8) (actual time=0.003..0.016 rows=77 loops=259)
                                                   Index Cond: (development_id = properties.development_id)
                                       ->  Index Scan using allocations_pkey on allocations dev_allocs  (cost=0.00..0.27 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=19887)
                                             Index Cond: (dev_items.allocation_id = id)
                                 ->  Index Scan using index_properties_property_groupings_on_property_id on properties_property_groupings ppg  (cost=0.00..0.29 rows=2 width=8) (actual time=0.001..0.001 rows=2 loops=19887)
                                       Index Cond: (property_id = properties.id)
                           ->  Index Scan using property_groupings_pkey on property_groupings pg  (cost=0.00..0.27 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=39774)
                                 Index Cond: (id = ppg.property_grouping_id)
                     ->  Index Scan using index_allocation_items_on_property_grouping_id on allocation_items prop_items  (cost=0.00..0.36 rows=6 width=8) (actual time=0.001..0.001 rows=2 loops=39774)
                           Index Cond: (property_grouping_id = pg.id)
               ->  Index Scan using allocations_pkey on allocations prop_allocs  (cost=0.00..0.27 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=99567)
                     Index Cond: (id = prop_items.allocation_id)
 Total runtime: 716.692 ms
(39 rows)

The `LIMIT 1` in the EXISTS subquery is useless. And it is not optimised away, so it seems. — wildplasser, Jul 12 '12 at 08:14
The `distinct properties.*` in the final query causes an expensive final sort. And all the `LEFT JOIN`ed table entries don't seem to be referenced by the main query. — wildplasser, Jul 12 '12 at 08:47
You could replace the `JOIN developments ON developments.id = properties.development_id` by a `WHERE EXISTS (... FROM developments ...)` subquery, since you are not referencing any of the development fields, and are suppressing the duplicates afterwards anyway. — wildplasser, Jul 12 '12 at 09:29
@wildplasser `LIMIT 1` is recommended by PG docs so it can build better execution plan. Yes, distinct is expensive, I know that. Replacing `JOIN developments` with `where` won't help because that particular join doesn't produce cartesian. See my answer below. — Dmytrii Nagirniak, Jul 13 '12 at 00:46
@wildplasser, `LEFT JOIN`ed entries *are* used in the where clause, look a bit closer. — Dmytrii Nagirniak, Jul 13 '12 at 00:47
The two left JOIN ranges could be moved into the first EXISTS subquery; and the only thing that needs them to be LEFT appears to be the `OR developments.company_id = 175` clause in that subquery (which can probably be pulled *up*). Keeping the subquery isolated is a good thing, EXISTS is functionally the same as "select 1 ...LIMIT 1". BTW: you already proved my point in another reaction: moving the joined tables into the subquery reduces the time to ~50%. — wildplasser, Jul 13 '12 at 08:38

score 2 · Accepted Answer · answered Jul 13 '12 at 00:23

Answering my own question.

This query has 2 big issues:

6 LEFT JOINs that produce cartesian product (resulting in billion-s of records even on small dataset).
DISTINCT that has to sort that billion records dataset.

So I had to eliminate those.

The way I did it is by replacing JOINs with 2 subqueries (won't provide it here since it should be pretty obvious).

As a result, the actual time went from ~700-800ms down to ~45ms which is more or less acceptable.

score 1 · Answer 2 · edited Jul 12 '12 at 05:02

1

Most time is spend in the disk sort, you should use RAM by changing work_mem:

SET work_mem TO '20MB';

And check EXPLAIN ANALYZE again

edited Jul 12 '12 at 05:02

Praveen Kumar Purushothaman

164,888
24
203
252

answered Jul 12 '12 at 05:01

Frank Heikens

117,544
24
142
135

This wont' optimise query. The query is what the root cause is. And I'm already using `fsync=off` BTW. – Dmytrii Nagirniak Jul 12 '12 at 05:03
But it does change the behavior of the query planner. It has nothing to do with fsync, that's important for writes not for SELECTs. First give the database some RAM to use. – Frank Heikens Jul 12 '12 at 05:08
Ok. I've given PG 20MB of memory. It does indeed increase the speed (from ~700ms down to ~400). Thanks for the suggestion. But I got the most out of it (~45ms) by rewriting the query to use subqueries rather than LEFT JOINs. – Dmytrii Nagirniak Jul 12 '12 at 05:25

Optimise the PG query

2 Answers2