Bad optimization/planning on Postgres window-based queries (partition by(, group by?)) - 1000x speedup

Question

We are running Postgres 9.3.5. (07/2014) We have quite some complex datawarehouse/reporting setup in place (ETL, materialized views, indexing, aggregations, analytical functions, ...).

What I discovered right now may be difficult to implement in the optimizer (?), but it makes a huge difference in performance (only sample code with huge similarity to our query to reduce unnecessary complexity):

create view foo as
select
  sum(s.plan) over w_pyl as pyl_plan,      -- money planned to spend in this pot/loc/year
  sum(s.booked) over w_pyl as pyl_booked,  -- money already booked in this pot/loc/year

  -- money already booked in this pot/loc the years before (stored as sum already)
  last_value(s.booked_prev_years) over w_pl as pl_booked_prev_years,    

  -- update 2014-10-08: maybe the following additional selected columns
  -- may be implementation-/test-relevant since they could potentially be determined
  -- by sorting within the partition:
  min(s.id) over w_pyl,
  max(s.id) over w_pyl,

  -- ... anything could follow here ...
  x.*,
  s.*
from
  pot_location_year x  -- may be some materialized view or (cache/regular) table
  left outer join  spendings s 
    on (s.pot = x.pot and s.loc = x.loc and s.year = x.year)
window
  w_pyl  as (partition by  x.pot, x.year, x.loc)
  w_pl   as (partition by  x.pot, x.loc  order by x.year)

We have these two relevant indexes in place:

pot_location_year_idx__p_y_l  -- on pot, year, loc
pot_location_year_idx__p_l_y  -- on pot, loc, year

Now we run an explain for some test query

explain select * from foo fetch first 100 rows only

This shows us some very bad performance, because the pyl index is used, where the result set has to be unnecessarily sorted twice :-( (the outmost WindowAgg/Sort step sorts ply because this is necessary for our last_value(..) as pl_booked_prev_years):

 Limit  (cost=289687.87..289692.12 rows=100 width=512)
   ->  WindowAgg  (cost=289687.87..292714.85 rows=93138 width=408)
         ->  Sort  (cost=289687.87..289920.71 rows=93138 width=408)
               Sort Key: x.pot, x.loc, x.year
               ->  WindowAgg  (cost=1.25..282000.68 rows=93138 width=408)
                     ->  Nested Loop Left Join  (cost=1.25..278508.01 rows=93138 width=408)
                           Join Filter: ...
                           ->  Nested Loop Left Join  (cost=0.83..214569.60 rows=93138 width=392)
                                 ->  Index Scan using pot_location_year_idx__p_y_l on pot_location_year x  (cost=0.42..11665.49 rows=93138 width=306)
                                 ->  Index Scan using ...  (cost=0.41..2.17 rows=1 width=140)
                                       Index Cond: ...
                           ->  Index Scan using ...  (cost=0.41..0.67 rows=1 width=126)
                                 Index Cond: ...

So the obvious problem is, that the planner should choose the existing ply index instead, to not have to sort twice.

I may add, that we have some migration here from an Oracle to a Postgres database where in the Oracle DB this problem/query seems(!) to be no issue. (I know there are a lot of other things influencing the planning and execution). — Andreas Covidiot, Oct 07 '14 at 13:49
It might be worth raising this on the pgsql-performance mailing list. Though I'm not completely sure it's true that (a,b,c) and (c,a,b) are entirely semantically equivalent for partitioning a window. — Craig Ringer, Oct 07 '14 at 13:55
just did this: http://postgresql.1045698.n5.nabble.com/Bad-optimization-planning-on-Postgres-window-based-queries-partition-by-group-by-1000x-speedup-td5822190.html — Andreas Covidiot, Oct 08 '14 at 06:35
Thinking what **may lead the planner on the "wrong track"** I **added the `min/max(s.id) over w_pyl`** select-columns above that are additionally queried as well. — Andreas Covidiot, Oct 08 '14 at 07:01
A link-only post like that is almost guaranteed to get ignored. Try to present a fully detailed post that can be read as-is. Remember, many people on there won't have SO accounts and may not want them. — Craig Ringer, Oct 08 '14 at 07:05
Fair enough. But the main purpose for me to write this down here is for documentation for potential later reuse/hinting for me and maybe others. If somebody can/wants to fix it (where I/my employer may benefit from it in some distant future), I provided enough info and attention-raising on it. Somebody else could easily copy it there if it's better for his way of working with the existing (web) infrastructures. So don't feel offended please and thanks for your feedback so far. — Andreas Covidiot, Oct 08 '14 at 07:15
some recent (4/2014) optimizations made it obviously into 9.5, but I did not dig deeper to find out which cases will be covered: http://www.postgresql.org/message-id/20140627015500.GR16098@tamriel.snowman.net — Andreas Covidiot, Oct 13 '14 at 12:14
other sources mentioning those problems: http://t193636.db-postgresql-general.dbtalk.us/push-predicate-down-in-view-containing-window-function-t193636.html, http://osdir.com/ml/postgresql-pgsql-performance/2014-01/msg00012.html — Andreas Covidiot, Oct 13 '14 at 12:21
it is much more appropriate for http://dba.stackexchange.com — Andreas Covidiot, Nov 26 '15 at 07:21

Andreas Covidiot · Accepted Answer · 2018-03-16T10:18:34.147

Luckily I found out that I could give the planner an (implicit) hint to do this by making sure that the column order of the other view partitions/windows is more homogenous although not semantically necessary.

The following change now returned what I had expected to get in the first place (the usage of the ply index):

...
window
  -- w_pyl  as (partition by  x.pot, x.year, x.loc)  -- showstopper (from above)
     w_pyl  as (partition by  x.pot, x.loc, x.year)  -- speedy
     w_pl   as (partition by  x.pot, x.loc  order by x.year)

The 1000 times faster performing result:

 Limit  (cost=1.25..308.02 rows=100 width=512)
   ->  WindowAgg  (cost=1.25..284794.82 rows=93138 width=408)
         ->  WindowAgg  (cost=1.25..282000.68 rows=93138 width=408)
               ->  Nested Loop Left Join  (cost=1.25..278508.01 rows=93138 width=408)
                     Join Filter: ...
                     ->  Nested Loop Left Join  (cost=0.83..214569.60 rows=93138 width=392)
                           ->  Index Scan using pot_location_year_idx__p_l_y on pot_location_year x  (cost=0.42..11665.49 rows=93138 width=306)
                           ->  Index Scan using ...  (cost=0.41..2.17 rows=1 width=140)
                                 Index Cond: ...
                     ->  Index Scan using ...  (cost=0.41..0.67 rows=1 width=126)
                           Index Cond: ...

Update 2014-10-09:

Tom Lane-2 wrote this (one of the major postgres developers) related to another (likely related) window function problem I am facing here as well on 2013-02 related to pg 9.2.2:

... There's not nearly that amount of intelligence in the system about window functions, as yet. So you'll have to write out the query longhand and put the WHERE clause at the lower level, if you want this optimization to happen.

So some more (debatable) general thoughts on the subject of window functions, data warehouse functionality etc. that could be considered here:

The above is a good statement to strengthen my assumption, when it was decided to do some Oracle->Postgres migration in general projects and in a DWH environment, that the risk of spending much more time and money doing so would be quite high. (Although the investigated functionality may seem sufficient.)

I like Postgres in important areas much more than Oracle, looking e.g. at the syntax and clarity of the code and other things (I guess even the source code and thus maintainability (in all its aspects) is much better there), but Oracle is clearly the much more advanced player in the resource optimization, support and tooling areas, when you are dealing with more complex db functionality outside the typical CRUD management.

I guess the open source Postgres (as well as the EnterpriseDB topups) will catch up in the long run in those areas, but it will take them at least 10 years, and maybe only if it is pushed heavily by big, altruistic¹ global players like Google etc.)

¹ altruistic in the sense, that if the pushed areas stay "free", the benefit for those companies must be surely somewhere else (maybe with some advertisement rows added randomly - I guess we could live with it here and there ;))

Update 2014-10-13:

As linked in my previous update above (2014-10-09), the optimization problems and their workaround solutions go on in a quite similiar way (after the above fix), when you want to query the above view with constraints/filters (here on pot_id):

explain select * foo where pot_id = '12345' fetch first 100 rows only

...

 Limit  (cost=1.25..121151.44 rows=100 width=211)
   ->  Subquery Scan on foo  (cost=1.25..279858.20 rows=231 width=211)
         Filter: ((foo.pot_id)::text = '12345'::text)
         ->  WindowAgg  (cost=1.25..277320.53 rows=203013 width=107)
               ->  WindowAgg  (cost=1.25..271230.14 rows=203013 width=107)
                     ->  Nested Loop Left Join  (cost=1.25..263617.16 rows=203013 width=107)
                           ->  Merge Left Join  (cost=0.83..35629.02 rows=203013 width=91)
                                 Merge Cond: ...
                                 ->  Index Scan using pot_location_year_idx__p_l_y on pot_location_year x  (cost=0.42..15493.80 rows=93138 width=65)
                                 ->  Materialize  (cost=0.41..15459.42 rows=33198 width=46)
                                       ->  Index Scan using ...  (cost=0.41..15376.43 rows=33198 width=46)
                           ->  Index Scan using ...  (cost=0.42..1.11 rows=1 width=46)
                                 Index Cond: ...

And as suggested in the above link, if you want to "push down" the contraint/filter before the window aggregation, you have to do it explicitly in the view itself already, which will be efficient for this type of query then with another 1000 times speedup for the 100th row:

 create view foo as
 ...
 where pot_id='12345'
 ...

...

 Limit  (cost=1.25..943.47 rows=100 width=211)
   ->  WindowAgg  (cost=1.25..9780.52 rows=1039 width=107)
         ->  WindowAgg  (cost=1.25..9751.95 rows=1039 width=107)
               ->  Nested Loop Left Join  (cost=1.25..9715.58 rows=1039 width=107)
                     ->  Nested Loop Left Join  (cost=0.83..1129.47 rows=1039 width=91)
                           ->  Index Scan using pot_location_year_idx__p_l_y on pot_location_year x (cost=0.42..269.77 rows=106 width=65)
                                 Index Cond: ((pot_id)::text = '12345'::text)
                           ->  Index Scan using ...  (cost=0.41..8.10 rows=1 width=46)
                                 Index Cond: ...
                     ->  Index Scan using ...  (cost=0.42..8.25 rows=1 width=46)
                           Index Cond: ...

After some more view parameterization effort² this approach will help speedup certain queries constraining those columns, but is still quite inflexible regarding a more general foo-view usage and query optimization.

²: You can "parameterize such a view" putting it (its SQL) in a (set-returning) table function (the Oracle equivalent to a pipelined table function). Further details regarding this may be found in the forum link above.

added details and "2nd-level workaround" to same-area of problems arising after the "1st-level workaround": constraints are not "pushed down" querying the view, but may sometimes be worked around by wrapping the view in a set-returning function and another 1000 times speedup. — Andreas Covidiot, Oct 13 '14 at 06:37
**another suggested**, and maybe unsatisfying, **workaround** of course could be, to **cache the unconstrained view results with help of materialized views**. — Andreas Covidiot, Oct 13 '14 at 06:50
another interesting remark: ***In other words, it is currently impossible to use Window functions in a View without risking serious, unexpected performance issues, even in simple cases!***: http://postgresql.1045698.n5.nabble.com/Missing-optimization-when-filters-are-applied-after-window-functions-td5708856.html — Andreas Covidiot, Oct 13 '14 at 11:10
optimizations and considerations like in this ***"Really Big Elephants"* Slideshare presentation (http://de.slideshare.net/PGExperts/really-big-elephants-postgresql-dw-15833438)** are **very helpful**, but cannot address more general aggregation-functionality problems of a DWH like stated above — Andreas Covidiot, Oct 24 '14 at 07:53
Hi Andreas. I've edited this a bit to improve its "reference quality" characteristics. In particular there was so much special formatting (bold, italic, bold+italic) it was quite hard to read - this is best applied sparingly. In addition there seems to be quite a lot of information in the comments - are you able to merge that in, minus the bold/italics again? Thanks! Aside from these adjustments this is a thorough answer. — halfer, May 04 '16 at 22:38
Thanks, although I find formatting for cross-reading better. Now its harder to spot or find important points fast. I only put the emoticon back in because otherwise this could be really misinterpreted. — Andreas Covidiot, May 05 '16 at 00:36
OK, no probs. I don't know what cross-reading is, but if you want a guideline for the amount of bold/italics that should go into a post, look at a Wikipedia article. We'll not get to that level of quality across Stack Overflow, but it isn't a bad reference model. — halfer, May 05 '16 at 07:23
@halfer: could not find a wikipedia article about it, but to my surprise although my english teacher at school used this word (so I used it as well), it does not actually seem to be common - it should mean skimming ("read across"). — Andreas Covidiot, May 06 '16 at 18:37
Try to select an alias for field, then index will be used. See described workaround [here](https://stackoverflow.com/a/67517525/4632019) — Eugen Konkov, May 13 '21 at 10:51

Bad optimization/planning on Postgres window-based queries (partition by(, group by?)) - 1000x speedup

1 Answers1

Linked