1

I was studying the CTE docs today and playing around with an example below (slightly modified from the example in the doc):

WITH RECURSIVE t(n) AS (
    VALUES (1), (2)
  UNION ALL
    (
      SELECT n + 1
      FROM t
      WHERE n < 100
    )
)
SELECT * FROM t;

This piece of code outputs

n
-----
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
...

Which is fine by itself. However, in the doc, the explanation goes like the following (https://www.postgresql.org/docs/9.1/static/queries-with.html):

  1. Evaluate the non-recursive term. For UNION (but not UNION ALL), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.
  2. So long as the working table is not empty, repeat these steps:

    1. Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For UNION (but not UNION ALL), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.
    2. Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.

My interpretation of the explanations is as follows:

Initially, we insert (1), (2) into the working table. Then in the recursive step, we take the working table, which currently has (1), (2), UNION ALL the results of the recursive step, which returns (2), (3), and then place the results into intermediate table.

So after the first recursive call, we have (1), (2) in working table and (1), (2), (2), (3) in intermediate table.

Then we replace the values of working table with the values of intermediate table, and then empty out the intermediate table. So now working table has (1), (2), (2), (3) and intermediate table is empty.

Now next recursive call, we will take the working table which contains (1), (2), (2), (3), recurse on it, which yields (2), (3), (3), (4), and append it to the working table. So the working table now should have (1), (2), (2), (3), (2), (3), (3), (4). However, the results starts with (1), (2), (2), (3), (3), (3), (4), (4) etc.

Can anyone let me know where I went wrong in my line of reasoning?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
FurtiveFelon
  • 14,714
  • 27
  • 76
  • 97
  • http://stackoverflow.com/q/37880628/330315 and http://stackoverflow.com/questions/30015842 –  Jul 04 '16 at 21:20

2 Answers2

1

You seem to be confusing result and intermediate table, which are not the same.

(1), (2) is produced by the "non-recursive" term and placed in the result.
(1), (2) also goes into the working table.
-- start iteration here
(2), (3) is produced by the "recursive" term and added to the result.
(2), (3) are also placed in an intermediate table.
(2), (3) the content of the intermediate table is moved to the working table.
-- iterate

As the manual adds right there (link to the current manual):

Strictly speaking, this process is iteration not recursion, but RECURSIVE is the terminology chosen by the SQL standards committee.

Not related to your particular misunderstanding, just to explain my quotes around "recursive".

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0
1,2             = 1,2 -> keep
1,2 plus 1 = 2,3 -> keep
2,3 plus 1 = 3,4 -> keep
3,4 plus 1 = 4,5 -> keep

The result is 1,2,2,3,3,4,4,5, ....

The working table has only the immediately prior result

Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51