0

Is there a simple way to find the full path and update it for all topic with such data structure.

                 TOPIC_0000070 
                     | 
                 TOPIC_0000054
             /       |          \
    TOPIC_0000000 TOPIC_0000001 TOPIC_0000002
  • Initial data:

CREATE TABLE topics (id varchar, path ltree);

INSERT INTO "topics"(id, path) VALUES ('TOPIC_0000000', 'TOPIC_0000054.TOPIC_0000000');
INSERT INTO "topics"(id, path) VALUES ('TOPIC_0000001', 'TOPIC_0000054.TOPIC_0000001');
INSERT INTO "topics"(id, path) VALUES ('TOPIC_0000002', 'TOPIC_0000054.TOPIC_0000002');
INSERT INTO "topics"(id, path) VALUES ('TOPIC_0000054', 'TOPIC_0000070.TOPIC_0000054');
INSERT INTO "topics"(id, path) VALUES ('TOPIC_0000070', 'TOPIC_0000070');
id           | path                      |
-------------+---------------------------+
TOPIC_0000000|TOPIC_0000054.TOPIC_0000000|
TOPIC_0000001|TOPIC_0000054.TOPIC_0000001|
TOPIC_0000002|TOPIC_0000054.TOPIC_0000002|
...          |...                        |
TOPIC_0000054|TOPIC_0000070.TOPIC_0000054|
TOPIC_0000070|TOPIC_0000070              |
  • The expected result:
id           | path                                    |
-------------+-----------------------------------------+
TOPIC_0000000|TOPIC_0000070.TOPIC_0000054.TOPIC_0000000|
TOPIC_0000001|TOPIC_0000070.TOPIC_0000054.TOPIC_0000001|
TOPIC_0000002|TOPIC_0000070.TOPIC_0000054.TOPIC_0000002|
...          |...                                      |
TOPIC_0000054|TOPIC_0000070.TOPIC_0000054              |
TOPIC_0000070|TOPIC_0000070                            |

2 Answers2

1

You can arrive to that following a series of steps:

I recreated your table as

CREATE TABLE topics (id varchar, path varchar);

and inserted the same rows as per your example

1. Recursive query

WITH RECURSIVE path_finder AS
  (SELECT t.id,
          t.path PATH
   FROM topics t
   UNION SELECT pf.id,
                p.path || coalesce('.' || pf.id, p.id)
   FROM topics p
   INNER JOIN path_finder pf ON substring(pf.path, 0, position('.' in pf.path)) = p.id
   WHERE p.id <> p.path 
select * from path_finder order by 1,2

The join here checks that the prefix of a path string equals the id of an anchestor

substring(pf.path,0,position('.' in pf.path)) = p.id

The output of the above will correctly show al the combinations

defaultdb-> select * from path_finder order by 1,2;
      id       |                   path                    
---------------+-------------------------------------------
 TOPIC_0000000 | TOPIC_0000054.TOPIC_0000000
 TOPIC_0000000 | TOPIC_0000070.TOPIC_0000054.TOPIC_0000000
 TOPIC_0000001 | TOPIC_0000054.TOPIC_0000001
 TOPIC_0000001 | TOPIC_0000070.TOPIC_0000054.TOPIC_0000001
 TOPIC_0000002 | TOPIC_0000054.TOPIC_0000002
 TOPIC_0000002 | TOPIC_0000070.TOPIC_0000054.TOPIC_0000002
 TOPIC_0000054 | TOPIC_0000070.TOPIC_0000054
 TOPIC_0000070 | TOPIC_0000070
(8 rows)

2. Ranking

Now it's time to rank the output of step 1, we'll rank based on the number of dots (.) in the string, we can do that with

SELECT *,
          rank() OVER (PARTITION BY id
                       ORDER BY CHAR_LENGTH(PATH) - CHAR_LENGTH(REPLACE(PATH, '.', '')) DESC) rnk
   FROM path_finder
   WHERE PATH <> ''
   ORDER BY 1,
            2

result

      id       |                   path                    | rnk 
---------------+-------------------------------------------+-----
 TOPIC_0000000 | TOPIC_0000070.TOPIC_0000054.TOPIC_0000000 |   1
 TOPIC_0000000 | TOPIC_0000054.TOPIC_0000000               |   2
 TOPIC_0000001 | TOPIC_0000070.TOPIC_0000054.TOPIC_0000001 |   1
 TOPIC_0000001 | TOPIC_0000054.TOPIC_0000001               |   2
 TOPIC_0000002 | TOPIC_0000070.TOPIC_0000054.TOPIC_0000002 |   1
 TOPIC_0000002 | TOPIC_0000054.TOPIC_0000002               |   2
 TOPIC_0000054 | TOPIC_0000070.TOPIC_0000054               |   1
 TOPIC_0000070 | TOPIC_0000070                             |   1
(8 rows)

3. Final select filtering only the rows with ranking = 1

Full code

WITH RECURSIVE path_finder AS
  (SELECT t.id,
          t.path PATH
   FROM topics t
   UNION SELECT pf.id,
                p.path || coalesce('.' || pf.id, p.id)
   FROM topics p
   INNER JOIN path_finder pf ON substring(pf.path, 0, position('.' in pf.path)) = p.id
   WHERE p.id <> p.path ),
               ranking_rows AS
  (SELECT *,
          rank() OVER (PARTITION BY id
                       ORDER BY CHAR_LENGTH(PATH) - CHAR_LENGTH(REPLACE(PATH, '.', '')) DESC) rnk
   FROM path_finder
   WHERE PATH <> ''
   ORDER BY 1,
            2)
SELECT *
FROM ranking_rows
WHERE rnk=1;

result

      id       |                   path                    | rnk 
---------------+-------------------------------------------+-----
 TOPIC_0000000 | TOPIC_0000070.TOPIC_0000054.TOPIC_0000000 |   1
 TOPIC_0000001 | TOPIC_0000070.TOPIC_0000054.TOPIC_0000001 |   1
 TOPIC_0000002 | TOPIC_0000070.TOPIC_0000054.TOPIC_0000002 |   1
 TOPIC_0000054 | TOPIC_0000070.TOPIC_0000054               |   1
 TOPIC_0000070 | TOPIC_0000070                             |   1
(5 rows)
Ftisiot
  • 1,808
  • 1
  • 7
  • 13
1

I would start by splitting your rows into complete and incomplete based on whether the row's path begins at a root id.

with top_paths as (   -- Find the roots
  select path as top
    from topics 
   where nlevel(path) = 1
), complete as (      -- Find the complete paths
  select t.*
    from topics t
         join top_paths tp
           on tp.top = subpath(t.path, 0, 1)
), incomplete as (    -- Get the incomplete paths
  select *
    from topics t
   where id not in (select id from complete)
)

I would then join the incomplete records back to the complete records and apply some ltree surgery to rewrite the incomplete paths.

select *
  from complete
union all
select i.id, c.path||subpath(i.path, 1, 999) as path
  from incomplete i
       join complete c  -- make this a left join if you are worried about breaking the tree
         on c.id::ltree = subpath(i.path,  0, 1)
 order by path;

┌───────────────┬───────────────────────────────────────────┐
│      id       │                   path                    │
├───────────────┼───────────────────────────────────────────┤
│ TOPIC_0000070 │ TOPIC_0000070                             │
│ TOPIC_0000054 │ TOPIC_0000070.TOPIC_0000054               │
│ TOPIC_0000000 │ TOPIC_0000070.TOPIC_0000054.TOPIC_0000000 │
│ TOPIC_0000001 │ TOPIC_0000070.TOPIC_0000054.TOPIC_0000001 │
│ TOPIC_0000002 │ TOPIC_0000070.TOPIC_0000054.TOPIC_0000002 │
└───────────────┴───────────────────────────────────────────┘
(5 rows)

Feeding this into an update statement, you will not need the complete rows.

with top_paths as (
  select path as top
    from topics 
   where nlevel(path) = 1
), complete as (
  select t.*
    from topics t
         join top_paths tp
           on tp.top = subpath(t.path, 0, 1)
), incomplete as (
  select *
    from topics t
   where id not in (select id from complete)
), fixes as (
  select i.id, c.path||subpath(i.path, 1, 999) as path
    from incomplete i
         join complete c
           on c.id::ltree = subpath(i.path,  0, 1)
)
update topics
   set path = fixes.path
  from fixes
 where fixes.id = topics.id;

Results:

select * from topics order by path;

┌───────────────┬───────────────────────────────────────────┐
│      id       │                   path                    │
├───────────────┼───────────────────────────────────────────┤
│ TOPIC_0000070 │ TOPIC_0000070                             │
│ TOPIC_0000054 │ TOPIC_0000070.TOPIC_0000054               │
│ TOPIC_0000000 │ TOPIC_0000070.TOPIC_0000054.TOPIC_0000000 │
│ TOPIC_0000001 │ TOPIC_0000070.TOPIC_0000054.TOPIC_0000001 │
│ TOPIC_0000002 │ TOPIC_0000070.TOPIC_0000054.TOPIC_0000002 │
└───────────────┴───────────────────────────────────────────┘
(5 rows)

Mike Organek
  • 11,647
  • 3
  • 11
  • 26