How to pull hierarchical parents till the root parent without traversing redundant paths?

Question

I have a recursive SQL query that traverses a tree-like structure using a table variable. The query works correctly, but I'm facing performance issues. The idea to pull hierarchical parents till the root parent without traversing redundant paths(assuming it improves performance)

Here's the current structure of my SQL query:

DECLARE @lvl AS INT
DECLARE @rows AS INT
DECLARE @foo AS TABLE(
    parent_id INT,
    child_id INT,
    lvl INT)

--Get anchor condition
INSERT @foo (parent_id, child_id, lvl)
SELECT parent_id, child_id, 0
FROM bar WHERE child_id IN (SELECT child_id from bar WHERE child_id BETWEEN 50 AND 150)



SET @rows=@@ROWCOUNT
SET @lvl=0

--Do recursion
WHILE @rows > 0
BEGIN
    SET @lvl = @lvl + 1

    INSERT @foo (parent_id, child_id, lvl)
    SELECT DISTINCT b.parent_id, b.child_id, @lvl
    FROM bar b
    INNER JOIN @foo f ON b.child_id = f.parent_id
    -- avoid revisiting nodes: do not insert a node if its parent_id is already in the table
    LEFT JOIN @foo dup ON dup.child_id = b.parent_id 
    WHERE f.lvl = @lvl-1 AND dup.parent_id IS NULL

    SET @rows=@@ROWCOUNT 
END

SELECT * FROM @foo ORDER BY lvl DESC

Used a loop-based approach combined with the use of a temporary table variable (@foo) to gradually build the hierarchical parent-child relationships, ensuring that redundant paths are avoided during the traversal process.

Expected Output:

Each child brings in its parent and its parent brings in its own parent.Until the root parent

Code snippet to create the table

CREATE TABLE bar(
    parentUId INT,
    childUid INT
);

WHILE @counter <= 10000
BEGIN
    DECLARE @parent_id INT;
    DECLARE @child_id INT;
    
    -- Generate child_id
    SET @child_id = @counter;
    
    -- Generate parent_id
    SET @parent_id = FLOOR((@counter - 1) / 10) + 1;
    
    INSERT INTO bar (parent_id, child_id)
    VALUES (@parent_id, @child_id);
    
    SET @counter = @counter + 1;
END;

The current query produces the desired output, but it takes a significant amount of time to execute, especially when the data set grows larger. I have a hierarchical structure that consists of 12 levels. The provided example is simplified.

I'm looking for suggestions on how to optimize this SQL query to improve its performance. Are there any modifications I can make to the query to achieve better performance? Any insights or alternative approaches would be greatly appreciated.

I already tried recursive CTE.

JNevill · Answer 1 · 2023-06-02T20:19:51.693

3

Solving via recursive CTE is definitely the better approach to running SQL in a loop procedurally.

In this case it would look something like:

WITH reccte AS
(
    SELECT ParentUID, ChildUID, 0 as Depth
    FROM bar 
    WHERE ChildUID NOT IN (SELECT DISTINCT ParentUID FROM bar)

    UNION ALL
    SELECT b.ParentUID, b.ChildUID, r.Depth + 1
    FROM reccte r
        INNER JOIN bar b
            ON r.ParentUID = b.ChildUID
)
SELECT DISTINCT * FROM reccte

Update: I forgot to add the dbfiddle link showing this in action.

edited Jun 02 '23 at 20:19

answered Jun 02 '23 at 18:01

JNevill

46,980
4
38
63

This method potentially involves traversing through redundant pathways that have already been visited, which could result in unnecessary computational expense. Moreover, an additional process of ensuring distinctness is required at the end of this operation, which could potentially increase its cost further. Isn't this approach therefore considered highly resource-intensive? Please correct me if i am wrong. – minato namikaze Jun 03 '23 at 03:25
1

Compared to your approach, this is going to be much faster. There just isn't any way to procedurally run sql in a loop that is going to be cost effective. You must remember that SQL is declarative. We tell the database what it is we want returned to us, not how to execute. The database is where all the brains are to determine the best path for getting the result set you described with your sql. In most cases, a single sql submission is going to be faster than multiple. In your case you are likely passing in hundreds or thousands of statements. It's just not going to go fast. – JNevill Jun 05 '23 at 13:10

score -1 · Answer 2 · answered Jun 02 '23 at 18:55

indexes are almost always a correct choice when trying to fix a sql performance problem. Do you have an index on @foo or bar ?

These sort of indexes should improve performance by at least as much as smart code will. And, they will be compatible with helping any of the code solutions.

create index X1 on bar (childId) include (parentId)

create index X2 on bar (parentId) include (childId)

create index Y1 on @foo (childId) include (parentId)

create index Y2 on @foo (parentId) include (childId)

How to pull hierarchical parents till the root parent without traversing redundant paths?

2 Answers2