0
DROP TABLE IF EXISTS t;

CREATE TABLE t(
    mypath varchar(100),
    parent_path varchar(100)
);

INSERT INTO t VALUES ('a', NULL),('a/b', 'a'),('a/b/c', 'a/b');

-- Listing all parent paths

1) using LIKE, not making use of parent_path column:

SELECT a.mypath, b.mypath aS parent_path
FROM t a
JOIN t b ON a.mypath LIKE b.mypath + '%' AND a.mypath != b.mypath

enter image description here

2) Using a recursive cte, making use of parent_path column

WITH cte AS (
    SELECT mypath, parent_path
    FROM t

    UNION ALL

    SELECT a.mypath, b.parent_path
    FROM cte a
    JOIN t b ON a.parent_path = b.mypath
)
SELECT * FROM cte WHERE parent_path IS NOT NULL;

enter image description here

On large datasets, what are the pros and cons of each method, performance wise ? Am I right to thing the rcte method should be faster ? Should LIKE be able to make use of an index, since there is only a trailing wildcard ?

Ludovic Aubert
  • 9,534
  • 4
  • 16
  • 28
  • I tend to use Range Keys. Once established, they make it super easy for navigation and/or aggregation https://stackoverflow.com/questions/43127166/efficient-way-to-get-ancestors-materialized-path/43128196#43128196 – John Cappelletti Feb 28 '20 at 16:59
  • 1
    Is this data overly simplified? If not, would a `hierarchyid` not be a better suite here? – Thom A Feb 28 '20 at 17:00
  • You mean a FK to an int PK ? Could be a good idea, yes. – Ludovic Aubert Feb 28 '20 at 17:03
  • You can test this yourself easily and examine the execution plans on your real data using your real queries. Performance questions are almost always highly specific to the schema, query, and data. – SMor Feb 28 '20 at 18:12

0 Answers0