Is the outer WHERE clause optimized in a Recursive CTE?

Question

With the following table definition:

CREATE TABLE Nodes(id INTEGER, child INTEGER);

INSERT INTO Nodes(id, child) VALUES(1, 10);
INSERT INTO Nodes(id, child) VALUES(1, 11);
INSERT INTO Nodes(id, child) VALUES(1, 12);

INSERT INTO Nodes(id, child) VALUES(10, 100);
INSERT INTO Nodes(id, child) VALUES(10, 101);
INSERT INTO Nodes(id, child) VALUES(10, 102);

INSERT INTO Nodes(id, child) VALUES(2, 20);
INSERT INTO Nodes(id, child) VALUES(2, 21);
INSERT INTO Nodes(id, child) VALUES(2, 22);

INSERT INTO Nodes(id, child) VALUES(20, 200);
INSERT INTO Nodes(id, child) VALUES(20, 201);
INSERT INTO Nodes(id, child) VALUES(20, 202);

With the following query:

WITH RECURSIVE members(base, id, level) AS ( 
    SELECT n1.id, n1.id, 0
    FROM Nodes n1
    LEFT OUTER JOIN Nodes n2 ON n2.child = n1.id
    WHERE n2.id IS NULL

    UNION 

    SELECT m.base, n.child, m.level + 1
    FROM members m 
    INNER JOIN Nodes n ON  m.id=n.id
) 
SELECT m.id, m.level 
FROM members m 
WHERE m.base IN (1)

Is the outer WHERE clause optimized in a Recursive CTE? An alternate that I have considered using is:

WITH RECURSIVE members(id, level) AS ( 
    VALUES (1, 0)

    UNION 

    SELECT n.child, m.level + 1
    FROM members m 
    INNER JOIN Nodes n ON  m.id=n.id
) 
SELECT m.id, m.level 
FROM members m

but it has the problem of not being able to create a view out of it. Therefore, if the performance difference between the two is minimal, I'd prefer to create a view out of the recursive CTE and then just query that.

Where is the rest o that VALUES statement; it needs to have two elements. — Pieter Geerkens, Sep 28 '14 at 04:42
@PieterGeerkens yes, it is dynamically filled in with `(_ID_, 0),` repeated for each id. The second query thus assumes that only root node ids are passed in and does no checking on its own. — chacham15, Sep 28 '14 at 04:45
Oh! I didn't realize that `%s` was a magic string interpreted by the engine as `whatever is needed, correctly of course`. — Pieter Geerkens, Sep 28 '14 at 04:46
@PieterGeerkens my apologies, `%s` is a typical fill in stuff here phrase and I thought it wasnt very material to the question. Im sorry if I confused you. I changed it to make it more clear. — chacham15, Sep 28 '14 at 04:48

score 1 · Accepted Answer · answered Sep 28 '14 at 08:11

To be able to apply the WHERE clause to the queries inside the CTE, the database would be required to prove that

all values in the first column are unchanged by the recursion and go back to the base query, and, in general, that
it is not possible for any filtered-out row to have any children that could show up in the result of the query, or affect the CTE in any other way.

Such a prover does not exist. See restriction 22 of Subquery flattening.

score 0 · Answer 2 · answered Sep 28 '14 at 05:05

To see why your first query is non-optimal, try running both with UNION ALL instead of just UNION. With the sample data given, the first will return 21 rows while the second returns only 7.

The duplicate rows in the actual first query are subsequently eliminated by performing a sort and duplicate elimination, while this step is not necessary in the actual second query.

Is the outer WHERE clause optimized in a Recursive CTE?

2 Answers2