20

I have a simple question. Somehow I was unable to find a definitive answer.

How much is WITH RECURSIVE syntax optimized in PostgreSQL? By that I mean: is it merely a syntactic sugar for a series of non recursive queries, OR is it more of a single statement that despite its complicated semantics has been optimized as a whole. A follow-up question - just about how much is it possible to optimize this kind of syntax? Of course some concrete data on the matter is most welcome.

julx
  • 8,694
  • 6
  • 47
  • 86

2 Answers2

20

I've found it optimized up to a point.

The various subqueries are re-used as expected and are optimized individually, and Postgres optimizes the latter just like any other query.

My main gripe with it has to do with that it won't inject constraints into the CTEs when it could.

For instance:

with recursive
parents as (
select node.id,
       node.parent_id
from nodes as node
union all
select node.id,
       parent.parent_id
from parents as node
join nodes as parent on parent.id = node.parent_id
)
select parent_id
from parents
where id = 2;

Postgres would ideally understand, in the above, that (since node.id is returned as is) it can do:

with recursive
parents as (
select node.id,
       node.parent_id
from nodes as node
where id = 2
union all
select node.id,
       parent.parent_id
from parents as node
join nodes as parent on parent.id = node.parent_id
)
select parent_id
from parents;

... and use an index scan on the primary key. In practice, it'll actually do exactly when the CTE tells it to do: recursively pull all parents for all rows, place the result set in an unnamed temporary table if needed, and then check each row from the result set one for id = 2.

In other words, a CTE does not keep a trace of the "originating" table/row/column set that it's returning. Until this gets optimized properly, creating a view on a recursive query is crazy at best.

A good workaround in the meanwhile is to create an sql function instead:

create function parents(id int) as returns table (id int) $$
    with recursive
    parents as (
    select node.id,
           node.parent_id
    from nodes as node
    where id = $1
    union all
    select node.id,
           parent.parent_id
    from parents as node
    join nodes as parent on parent.id = node.parent_id
    )
    select parent_id
    from parents;
$$ language sql stable strict rows 5 cost 1;

Another issue is you can't use FOR UPDATE with recursive CTEs (for very much the same reason, in fact).

Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154
  • Given the age of this answer, are you aware of any recent changes in the behavior you described? – Oliver Salzburg Jul 20 '17 at 09:56
  • @OliverSalzburg: I honestly have no idea, but it should be reasonably easy to test by comparing the query plans of the queries in the answer. – Denis de Bernardy Jul 20 '17 at 10:16
  • Hi, I not see before, perhaps this answer is also the answer of [this question](https://stackoverflow.com/q/51982247/287948)... But there are composite-pk, array and update context. – Peter Krauss Aug 23 '18 at 09:31
5

My experience is that it is indeed very well optimized.

Check out the execution plan for your query generated by EXPLAIN ANALYZE and you'll see how "costly" it really is (and then compare that e.g. to a self written recursive function)

  • 1
    The whole reason for have `WITH RECURSIVE` at the language level is so that the database knows what you're trying to do and can then act based on your intent. – mu is too short May 02 '11 at 19:32
  • @mu Yea, well that's what I reckoned. However it's not obvious how it could for example be optimized just by using indices and such even on the database level. Some operation are just hard in general. Following the suggestion I'll post my findings as soon as I'm done. – julx May 02 '11 at 19:59
  • @julkiewicz A transitive closure table is in effect an index for such queries. I see no reason an index type couldn't be offered that used similar techniques. – EricS Sep 10 '13 at 08:31