3

Is there a simple way to select the root node of a subtree (PostgreSQL ltree) from a query which returns (potentially) several descendant nodes of that same subtree? I've implemented a rather verbose algorithm for achieving the task (~40 lines, indented and formatted), but it would be awesome if I could leverage the fact that ltree data are in fact trees and have an easily accessible root node. It is important to note that several, distinct subtree roots may be returned from a single query, so I cannot merely sort the data and grab the top result.

June 07, 2012: I have updated the query to my most recent version, which cuts the time complexity in half. It uses a self-anti-join (if you will) to remove all nodes from the subtree which have ancestors in the subtree.

Essentially, my algorithm works as follows:

WITH roots AS
(
  /* Place any query here, which returns a field "ancestry" of type ltree */
)
SELECT roots.*
  FROM roots
WHERE NOT EXISTS
(
  SELECT 1
    FROM roots AS ancestors
  WHERE ancestors.ancestry @> roots.ancestry
    AND ancestors.id <> roots.id
);

(for more details, please see my gist, here: https://gist.github.com/1507368)

Dylon
  • 1,730
  • 15
  • 14
  • I am using version 9.1.2 – Dylon Dec 20 '11 at 18:57
  • Your question is contradictory. You want to determine the (one) root node while claiming there are `several, distinct subtree roots`. Both at the same time? – Erwin Brandstetter Dec 21 '11 at 19:48
  • It is not contradictory, please see my gist, here: https://gist.github.com/1507368 – Dylon Dec 21 '11 at 19:52
  • Any node with parent_id=NULL is considered the root of its tree, so there may be "m" trees (with "m" being the number of nodes having parent_id=NULL); as well, I am considering the nodes at level=0 to be the roots of their corresponding subtrees. The subtrees are selected according to both the physical structure of the nodes AND the predicates, so there may be "n" subtrees returned from a single tree, which would then have "n" roots (one per subtree). Here is the problem: there may be "k" descendants of any of those subtrees returned, but I only want the roots. – Dylon Dec 21 '11 at 19:58

1 Answers1

3

Can't you just use the subpath() function?

SELECT
  SUBPATH(ancestry, 0, 1)
FROM
  some_table;
Kouber Saparev
  • 7,637
  • 2
  • 29
  • 26
  • No, take this hierarchy for example: 1 <- 2 <- 3 <- 4 <- 5, where 2 inherits from 1, 3 inherits from 2, etc. Here, 1 is the root node of the tree. Now, let us assume that, according to the predicate for the JOIN above, the subtree returned from the query is this: 3 <- 4 <- 5 (excluding nodes 1 and 2). If I were to use the SUBPATH function, it would return 1 when I need it to return 3. This was a good suggestion, though. Thanks! – Dylon Dec 21 '11 at 14:53
  • Then I don't get it... if something is returning the path '3.4.5', evaluating subpath('3.4.5', 0, 1) would give you 3. Can you provide some sample structure, and sample data for these two tables, as well as the sample result you want to obtain? Mind the possibility to use nlevel(ancestry) in combination with subpath(). – Kouber Saparev Dec 21 '11 at 15:28
  • If the ancestry of node 5 was indeed '3.4.5', then that would work, but its ancestry is '1.2.3.4.5', for which SUBPATH(ancestry, 0, 1) would return 1, not 3. I've created a gist, here: https://gist.github.com/1507368 – Dylon Dec 21 '11 at 19:42
  • Using NLEVEL(ancestry) might work, but how would you determine the level of the subtree which would return the correct root? – Dylon Dec 21 '11 at 19:49
  • 1
    The way you did it by using common table expressions is quite good I think. There's no other optimal way to respect condition #3 otherwise. – Kouber Saparev Dec 23 '11 at 17:33