2

I have been reading around With Query in Postgres. And this is what I'm surprised with

WITH RECURSIVE t(n) AS (
    VALUES (1)
  UNION ALL
    SELECT n+1 FROM t WHERE n < 100
)
SELECT sum(n) FROM t;

I'm not able to understand how does the evaluation of the query work.

  • t(n) it sound like a function with a parameter. how does the value of n is passed.

Any insight on how the break down happen of the recursive statement in SQL.

LivingColors
  • 123
  • 7

2 Answers2

4

This is called a common table expression and is a way of expressing a recursive query in SQL:

t(n) defines the name of the CTE as t, with a single column named n. It's similar to an alias for a derived table:

select ... 
from (
  ...
) as t(n);

The recursion starts with the value 1 (that's the values (1) part) and then recursively adds one to it until the 99 is reached. So it generates the numbers from 1 to 99. Then final query then sums up all those numbers.

n is a column name, not a "variable" and the "assignment" happens in the same way as any data retrieval.

WITH RECURSIVE t(n) AS (
    VALUES (1) --<< this is the recursion "root"
  UNION ALL
    SELECT n+1 FROM t WHERE n < 100 --<< this is the "recursive part"
)
SELECT sum(n) FROM t;

If you "unroll" the recursion (which in fact is an iteration) then you'd wind up with something like this:

select x.n + 1
from (
  select x.n + 1
  from (
    select x.n + 1
    from (
      select x.n + 1
      from (
         values (1)
      ) as x(n) 
    ) as x(n)
  ) as x(n)
) as x(n)

More details in the manual:
https://www.postgresql.org/docs/current/static/queries-with.html

  • I understand what the SQL statement does all I want to know how does the evaluation work since. like for example how does the value n is assign to 1 (I know it get assigned by VALUES(1) but how?) next the Union ALL on `VALUES(1) and SELECT n+1 ... ` It just that it way different that how a traditional recursion function we write in high end language like `C` or `C++` – LivingColors Jun 17 '16 at 11:54
  • The purpose of the question is to understand the evaluation. So that I can be confident enough to write one if when needed. – LivingColors Jun 17 '16 at 11:55
  • @LivingColors `n` isn't a variable. It's a column name. –  Jun 17 '16 at 11:56
  • `Evaluate the non-recursive term. For UNION (but not UNION ALL), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.` . Can you help me understand what is the recursive term in the above example. I'm not able to understand which part of the statement is causing a recursion. – LivingColors Jun 17 '16 at 12:09
  • `values(1)` is the non-recursive part (aka "root"). The select after the `union all` is the recursive part (because it references/joins to the CTE (named `t`) –  Jun 17 '16 at 12:11
  • Looking at the opened form, how does `select x.n + 1 from ( values (1) ) as x(n)` adds column '2' to a temporary table that doesn't have it yet? – ibrahim tanyalcin Jun 10 '19 at 08:51
0

If you are looking for how it is evaluated, the recursion occurs in two phases.

  1. The root is executed once.
  2. The recursive part is executed until no rows are returned. The documentation is a little vague on that point.

Now, normally in databases, we think of "function" in a different way than we think of them when we do imperative programming. In database terms, the best way to think of a function is "a correspondence where for every domain value you have exactly one corresponding value." So one of the immediate challenges is to stop thinking in terms of programming functions. Even user-defined functions are best thought about in this other way since it avoids a lot of potential nastiness regarding the intersection of running the query and the query planner... So it may look like a function but that is not correct.

Instead the WITH clause uses a different, almost inverse notation. Here you have the set name t, followed (optionally in this case) by the tuple structure (n). So this is not a function with a parameter, but a relation with a structure.

So how this breaks down:

SELECT 1 as n where n < 100
UNION ALL
SELECT n + 1 FROM (SELECT 1 as n) where n < 100
UNION ALL
SELECT n + 1 FROM (SELECT n + 1 FROM (SELECT 1 as n)) where n < 100

Of course that is a simplification because internally we keep track of the cte state and keep joining against the last iteration, so in practice these get folded back to near linear complexity (while the above diagram would suggest much worse performance than that).

So in reality you get something more like:

 SELECT 1 as n where 1 < 100
 UNION ALL
 SELECT 1 + 1 as n where 1 + 1 < 100
 UNION ALL
 SELECT 2 + 1 AS n WHERE 2 + 1 < 100
 ...

In essence the previous values carry over.

Chris Travers
  • 25,424
  • 6
  • 65
  • 182