15

I have a folder table that joins to itself on an id, parent_id relationship:

CREATE TABLE folders (
  id int(10) unsigned NOT NULL AUTO_INCREMENT,
  title nvarchar(255) NOT NULL,
  parent_id int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (id)
);

INSERT INTO folders(id, title, parent_id) VALUES(1, 'root', null);
INSERT INTO folders(id, title, parent_id) values(2, 'one', 1);
INSERT INTO folders(id, title, parent_id) values(3, 'target', 2);
INSERT INTO folders(id, title, parent_id) values(4, 'child one', 3);
INSERT INTO folders(id, title, parent_id) values(5, 'child two', 3);
INSERT INTO folders(id, title, parent_id) values(6, 'root 2', null);
INSERT INTO folders(id, title, parent_id) values(7, 'other child one', 6);
INSERT INTO folders(id, title, parent_id) values(8, 'other child two', 6);

I want a query that returns all the parents of that record, right back to the route and any children.

So if I ask for folder with id=3, I get records: 1, 2, 3, 4, 5. I am stuck how to get the parents.

The version of MYSQL is 5.7 and there are no immediate plans to upgrade so sadly CTEs are not an option.

I have created this sql fiddle

GMB
  • 216,147
  • 25
  • 84
  • 135
dagda1
  • 26,856
  • 59
  • 237
  • 450

8 Answers8

8

In your table design, ID and PARENT_ID corresponds to the "Adjacency List Model" for storing a tree.

There is another design, called the "Nested Set Model", which makes it easier to perform the operations you want here.

See this excellent article from Mike Hillyer describing both: managing-hierarchical-data-in-mysql

In summary:

The tree is stored in a table like:

CREATE TABLE nested_category (
        category_id INT AUTO_INCREMENT PRIMARY KEY,
        name VARCHAR(20) NOT NULL,
        lft INT NOT NULL,
        rgt INT NOT NULL
);

Finding the path from the root to a given node (here, 'FLASH'):

SELECT parent.name
FROM nested_category AS node,
        nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND node.name = 'FLASH'
ORDER BY parent.lft;

Finding all children of a given node (here 'PORTABLE ELECTRONICS'):

SELECT node.name, (COUNT(parent.name) - (sub_tree.depth + 1)) AS depth
FROM nested_category AS node,
        nested_category AS parent,
        nested_category AS sub_parent,
        (
                SELECT node.name, (COUNT(parent.name) - 1) AS depth
                FROM nested_category AS node,
                        nested_category AS parent
                WHERE node.lft BETWEEN parent.lft AND parent.rgt
                        AND node.name = 'PORTABLE ELECTRONICS'
                GROUP BY node.name
                ORDER BY node.lft
        )AS sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
        AND sub_parent.name = sub_tree.name
GROUP BY node.name
HAVING depth <= 1
ORDER BY node.lft;

After renaming to your folders table

  • TABLE nested_category -> TABLE folders
  • Column category_id -> Column id
  • Column name -> Column title

The solution is:

CREATE TABLE folders (
        id INT AUTO_INCREMENT PRIMARY KEY,
        title VARCHAR(20) NOT NULL,
        lft INT NOT NULL,
        rgt INT NOT NULL
);

INSERT INTO folders(id, title, lft, rgt) values(1, 'root', 1, 10);
INSERT INTO folders(id, title, lft, rgt) values(2, 'one', 2, 9);
INSERT INTO folders(id, title, lft, rgt) values(3, 'target', 3, 8);
INSERT INTO folders(id, title, lft, rgt) values(4, 'child one', 4, 5);
INSERT INTO folders(id, title, lft, rgt) values(5, 'child two', 6, 7);
INSERT INTO folders(id, title, lft, rgt) values(6, 'root 2', 11, 16);
INSERT INTO folders(id, title, lft, rgt) values(7, 'other child one', 12, 13);
INSERT INTO folders(id, title, lft, rgt) values(8, 'other child two', 14, 15);

Path to the target:

SELECT parent.title
FROM folders AS node,
        folders AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND node.title = 'target'
ORDER BY parent.lft;

Target children:

SELECT node.title, (COUNT(parent.title) - (sub_tree.depth + 1)) AS depth
    FROM folders AS node,
            folders AS parent,
            folders AS sub_parent,
            (
              SELECT node.title, (COUNT(parent.title) - 1) AS depth
                    FROM folders AS node,
                            folders AS parent
                    WHERE node.lft BETWEEN parent.lft AND parent.rgt
                            AND node.title = 'target'
                    GROUP BY node.title
                    ORDER BY node.lft
            )AS sub_tree
    WHERE node.lft BETWEEN parent.lft AND parent.rgt
            AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
            AND sub_parent.title = sub_tree.title
    GROUP BY node.title
    HAVING depth <= 1
    ORDER BY node.lft;

See sqlfiddle

To get all the data in a single query, a union should do.

GMB
  • 216,147
  • 25
  • 84
  • 135
Marc Alff
  • 8,227
  • 33
  • 59
7

In MySQL 8.0, you can make use of the Recursive Common Table Expressions to adress this use case.

The following query gives you the parents of a given record (including the record itself):

with recursive parent_cte (id, title, parent_id) as (
  select id, title, parent_id
  from folders
  where id = 3
  union all
  select  f.id, f.title, f.parent_id
  from folders f
  inner join parent_cte pc on f.id = pc.parent_id
)
select * from parent_cte;
| id  | title  | parent_id |
| --- | ------ | --------- |
| 3   | target | 2         |
| 2   | one    | 1         |
| 1   | root   |           |

And here is a slightly different query, that returns the children tree of a given record:

with recursive children_cte (id, title, parent_id) as (
  select id, title, parent_id
  from folders
  where parent_id = 3
  union all
  select  f.id, f.title, f.parent_id
  from folders f
  inner join children_cte cc on f.parent_id = cc.id
)
select * from children_cte;
| id  | title     | parent_id |
| --- | --------- | --------- |
| 4   | child one | 3         |
| 5   | child two | 3         |

Both queriers can be combined as follows:

with recursive parent_cte (id, title, parent_id) as (
  select id, title, parent_id
  from folders
  where id = 3
  union all
  select  f.id, f.title, f.parent_id
  from folders f
  inner join parent_cte pc on f.id = pc.parent_id
),
children_cte (id, title, parent_id) as (
  select id, title, parent_id
  from folders
  where parent_id = 3
  union all
  select  f.id, f.title, f.parent_id
  from folders f
  inner join children_cte cc on f.parent_id = cc.id
)
select * from parent_cte
union all select * from children_cte;
| id  | title     | parent_id |
| --- | --------- | --------- |
| 3   | target    | 2         |
| 2   | one       | 1         |
| 1   | root      |           |
| 4   | child one | 3         |
| 5   | child two | 3         |

Demo on DB Fiddle

GMB
  • 216,147
  • 25
  • 84
  • 135
  • Could have used this functionality years ago, wasn't aware of it. – EternalHour Mar 21 '19 at 21:48
  • It wasn't available in MySQL until version 8.0.1, released April 2017. Other SQL products have had it for some years. See my answer to https://stackoverflow.com/questions/324935/mysql-with-clause/325243#325243 – Bill Karwin Mar 21 '19 at 21:52
  • 4
    it is a great answer but sadly I'm on 5.7 in the crappy SAS we are using. I can do it in code but is there a way pre CTE to do this? Any pointers at all appreciated – dagda1 Mar 22 '19 at 08:54
3

I've solved this in the past with a second table, which contains the transitive closure of all paths through the tree.

mysql> CREATE TABLE folders_closure (
 ancestor INT UNSIGNED NOT NULL,
 descendant INT UNSIGNED NOT NULL,
 PRIMARY KEY (ancestor, descendant),
 depth INT UNSIGNED NOT NULL
);

Load this table with tuples of all ancestor-descendant pairs, including the ones where a node in the tree references itself (path of length 0).

mysql> INSERT INTO folders_closure VALUES
     (1,1,0), (2,2,0), (3,3,0), (4,4,0), (5,5,0), (6,6,0),
     (1,2,1), (2,3,1), (3,4,1), (3,5,1), (1,4,2), (1,5,2),
     (6,7,1), (6,8,1);

Now you can query the tree below a given node by querying all the paths that start at the top node, and join that path's descendant to your folders table.

mysql> SELECT d.id, d.title, cl.depth FROM folders_closure cl
     JOIN folders d ON d.id=cl.descendant WHERE cl.ancestor=1;
+----+-----------+-------+
| id | title     | depth |
+----+-----------+-------+
|  1 | root      |     0 |
|  2 | one       |     1 |
|  4 | child one |     2 |
|  5 | child two |     2 |
+----+-----------+-------+

I see many people recommend the Nested Sets solution which was introduced in 1992, and became popular after Joe Celko included it in his book SQL for Smarties in 1995. But I don't like the Nested Sets technique, because the numbers aren't actually references to the primary keys of the nodes in your tree, and it requires renumbering many rows when you add or delete a node.

I wrote about the closure table method in What is the most efficient/elegant way to parse a flat table into a tree? and some of my other answers with the hierarchical-data tag.

I did a presentation about it: Models for Hierarchical Data.

I also covered this in a chapter of my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • How does the size of the closure table grow, when the number of nodes in the tree (N) grows ? – Marc Alff Mar 28 '19 at 16:01
  • 1
    In theory, `O(n^2 / 2)` but in practice it's usually less than that. – Bill Karwin Mar 28 '19 at 16:17
  • I tested with a tree of 518,000 nodes, and if I recall it created fewer than 5 million rows in the closure table. – Bill Karwin Mar 28 '19 at 16:26
  • @MarcAlff `N²/2` is the worst case - when you have a single chain, which is not really a tree. The exact number is `N * average_depth`. For a balanced binary tree it is `N * log2(N)`, which is something like 20M rows for 1M nodes. But most trees are not binary. For example: The average nesting depth in frequently used tree-style forum will almost stop growing at some point, and you will end up with something like 5x to 10x rows per post. So 5M rows for 500K rows is a realistic number. – Paul Spiegel Mar 29 '19 at 22:17
  • Yes, my test data was the taxonomy of all plants, animals, and fungi from https://www.itis.gov. – Bill Karwin Mar 29 '19 at 22:45
3

If it's guaranteed that child nodes always have a higher id than it's parent, then you could use user variables.

Get descendants:

select f.*, @l := concat_ws(',', @l, id) as dummy
from folders f
cross join (select @l := 3) init_list
where find_in_set(parent_id, @l)
order by id

Result:

id | title     | parent_id | dummy
---|-----------|-----------|------
 4 | child one |         3 | 3,4
 5 | child two |         3 | 3,4,5

Get ancestors (including itself):

select f.*, @l := concat_ws(',', @l, parent_id) as dummy
from folders f
cross join (select @l := 3) init_list
where find_in_set(id, @l)
order by id desc

Result:

id | title  | parent_id | dummy
 3 | target |         2 | 3,2
 2 | one    |         1 | 3,2,1
 1 | root   |      null | 3,2,1

Demo

Note that this technique relies on undocumented evaluation order, and will not be possible in future versions.

Also it is not very performant, since both queries need a full table scan, but might be fine for smaller tables. However - for small tables I would just fetch the full table and solve the task with a recursive function in application code.

For bigger tables I would consider a more complex solution like the following stored procedure:

create procedure get_related_nodes(in in_id int)
begin
  set @list = in_id;
  set @parents = @list;

  repeat
    set @sql = '
      select group_concat(id) into @children
      from folders
      where parent_id in ({parents})
    ';
    set @sql = replace(@sql, '{parents}', @parents);
    prepare stmt from @sql;
    execute stmt;
    set @list = concat_ws(',', @list, @children);
    set @parents = @children;
  until (@children is null) end repeat;

  set @child = in_id;
  repeat
    set @sql = '
      select parent_id into @parent
      from folders
      where id = ({child})
    ';
    set @sql = replace(@sql, '{child}', @child);
    prepare stmt from @sql;
    execute stmt;
    set @list = concat_ws(',', @parent, @list);
    set @child = @parent;
  until (@parent is null) end repeat;

  set @sql = '
    select *
    from folders
    where id in ({list})
  ';
  set @sql = replace(@sql, '{list}', @list);
  prepare stmt from @sql;
  execute stmt;
end

Use it with

call get_related_nodes(3)

This will return

id | title     | parent_id
---|-----------|----------
 1 | root      | 
 2 | one       | 1
 3 | target    | 2
 4 | child one | 3
 5 | child two | 3

Demo

I expect this procedure to perform as good as a recursive CTE query. In any case you should have an index on parent_id.

Paul Spiegel
  • 30,925
  • 5
  • 44
  • 53
  • Could you please explain or direct me to a good resource regarding the working style of "cross join (select @l := 3) init_list"? That is a really short code and would like to learn more on it. – surpavan Mar 30 '19 at 20:52
  • 1
    @surpavan `cross join (select @l := 3) init_list` is a "trick" to avoid an extra query `set @l = 3;`. MySQL will execute/evaluate a "constant" subquery like this as the first step (you can see that when you use [EXPLAIN](https://www.db-fiddle.com/f/7QyB8q5W6GghnMnL5cTFWo/0)). So this line is just initializing the `@l` variable. You will find this "trick" in many other similar answers. – Paul Spiegel Mar 30 '19 at 20:59
  • After some digging, got to some point of understanding; but when i try for the value 7 (code screenshot: https://pasteboard.co/I7S6QJP.png), there are 5 rows with id in @l as per select statement, but when WHERE is applied, it is returning only 2 rows (although it is correct). – surpavan Mar 30 '19 at 21:45
  • 1
    @surpavan `@l := concat_ws(',', @l, id)` in the SELECT clause will append the `id` to `@l`, but only if the WHERE condition is TRUE for that row. IF you remove the WHERE clause, all ids will be appended. If you want to "play around" - try this: https://www.db-fiddle.com/f/vEVeEbLyKuBqCkLfuCz6M4/0 – Paul Spiegel Mar 30 '19 at 22:05
  • 1
    Actually for "parents" you don't need a list, and can use this: https://www.db-fiddle.com/f/8w1Jrfw4gXZzckudHjKzUz/0 – Paul Spiegel Mar 30 '19 at 22:09
  • Great explanation for the flow. It takes me to 101 sql in execution flow. Thank you for the valuable time and knowledge sharing. – surpavan Mar 30 '19 at 22:20
  • 2
    Be careful though and read my note in the answer. There is no such thing as "execution flow" or "execution order" in SQL, because SQL is not a procedural language. This only works due to MySQL's internal implementation. They don't want to "break" existing code, which relies on this implementation. But at the same time they need the "freedom" to change things. That's why this technique is **deprecated** and something like `@l := id` will not work in future versions (maybe already in 8.1). But from 8.0 on you better use a *recursive CTE* as suggested by GMB. – Paul Spiegel Mar 30 '19 at 22:35
  • Agreed, Thank you for the note. I tried the new .net connector, and it really has all new things and the generic way of connecting/executing sql code has changed completely, old code is not working at all. Same goes with the @l execudtion, understood. Thank you. – surpavan Mar 31 '19 at 09:30
2

if your parent_id comes always in ascending order then below query is the great solution.

if you get the result your id to null parent value then Please follow the link http://www.sqlfiddle.com/#!9/b40b8/258 (When passing id = 6) http://www.sqlfiddle.com/#!9/b40b8/259 (When passing id = 3)

SELECT * FROM folders f
WHERE id = 3 
OR 
  (Parent_id <=3 AND Parent_id >= 
  (SELECT id FROM folders Where id <= 3 AND parent_id IS NULL Order by ID desc LIMIT 1)) OR (id <= 3 AND IFNULL(Parent_id,0) = 0)
  AND id >= (SELECT id FROM folders Where id <= 3 AND parent_id IS NULL Order by ID desc LIMIT 1);

OR

You won't get your passing id to top at parent then please follow the link as below. http://www.sqlfiddle.com/#!9/b40b8/194 (When passing id =3)
http://www.sqlfiddle.com/#!9/b40b8/208 (When passing id =6)

SELECT 
      * 
FROM 
     folders f
WHERE 
    id = 3 OR Parent_id <=3 
    OR (id <= 3  AND IFNULL(Parent_id,0) = 0);
Hemang A
  • 1,012
  • 1
  • 5
  • 16
1

Note My solution is more or less same as @Marc Alff. Didn't realise it was already there before typing / preparing response in an editor.

It is very difficult to get a query to achieve your objective (or other typical requirements of hierarchical dataset) without use of CTEs or other hierarchical query supports (e.g. prior, connect by in Oracle). This was the main driver for databases to come up with CTEs etc.

Many many years ago when such support for modelling hierarchical entities weren't available in databases, requirements outlined by you and many other related were solved by modelling such entities slightly differently.

The concept is simple. In essence, two more attributes are introduced in the hierarchical table (or a separate table foreign keyed into hierarchical table) called left_boundary and right_boundary (call whatever you wish after all what’s in the name). For each row the values (numbers) for these attributes are so chosen that they cover the values of these attributes for all their children. In other words, a child’s left and right boundaries will be between left and right boundaries of its parents.

By the way of example

enter image description here

Creating this hierarchy used to be part of an early morning batch job or the boundaries were chosen so wide apart during design time that they were easily covering all depths of tree.

I am going to use this solution to achieve your objective. Firstly I will introduce a second table (could have introduced the attributes in the same table, decided not to disturb your data model)

CREATE TABLE folder_boundaries (
  id int(10) unsigned NOT NULL AUTO_INCREMENT,
  folder_id int(10) unsigned NOT NULL,
  left_boundary int(10) unsigned,
  right_boundary int(10) unsigned,
  PRIMARY KEY (id),
  FOREIGN KEY (folder_id) REFERENCES folders(id)
);

The data for this table based on your dataset

NSERT INTO folder_boundaries(folder_id, left_boundary, right_boundary) VALUES(1, 1, 10);
INSERT INTO folder_boundaries(folder_id, left_boundary, right_boundary) VALUES(2, 2, 9);
INSERT INTO folder_boundaries(folder_id, left_boundary, right_boundary) VALUES(3, 3, 8);
INSERT INTO folder_boundaries(folder_id, left_boundary, right_boundary) VALUES(4, 4, 4);
INSERT INTO folder_boundaries(folder_id, left_boundary, right_boundary) VALUES(5, 4, 4);
INSERT INTO folder_boundaries(folder_id, left_boundary, right_boundary) VALUES(6, 21, 25);
INSERT INTO folder_boundaries(folder_id, left_boundary, right_boundary) VALUES(7, 22, 22);
INSERT INTO folder_boundaries(folder_id, left_boundary, right_boundary) VALUES(7, 22, 22);

Here is the query to achieve what you are after

select f.id, f.title
from folders f
join folder_boundaries fb on f.id = fb.folder_id
where fb.left_boundary < (select left_boundary from folder_boundaries where folder_id = 3)
and fb.right_boundary > (select right_boundary from folder_boundaries where folder_id = 3)
union all
select f.id, f.title
from folders f
join folder_boundaries fb on f.id = fb.folder_id
where fb.left_boundary >= (select left_boundary from folder_boundaries where folder_id = 3)
and fb.right_boundary <= (select right_boundary from folder_boundaries where folder_id = 3)

Result

enter image description here

Gro
  • 1,613
  • 1
  • 13
  • 19
0

You can perform an union between parent rows and child rows like this :

select title, id, @parent:=parent_id as parent from
               (select @parent:=3 ) a join (select * from folders order by id desc) b where @parent=id
union select title, id, parent_id as parent from folders where  parent_id=3 ORDER BY id

here a sample dbfiddle

Abdelkarim EL AMEL
  • 1,515
  • 1
  • 10
  • 10
0

Suppose you know the maximum depth of the tree, you could "create" a loop to get what you want:

Get parent nodes:

SELECT  @id :=
        (
        SELECT  parent_id
        FROM    folders
        WHERE   id = @id
        ) AS folderId, vars.id
FROM    (
        SELECT  @id := 7 AS id
        ) vars
INNER JOIN (
    SELECT 0 AS nbr UNION ALL SELECT 1 UNION ALL SELECT 2 
 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 
 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 
 UNION ALL SELECT 9) temp
WHERE @id IS NOT NULL

Get child nodes:

SELECT  @id :=
        (
        SELECT  GROUP_CONCAT(id)
        FROM    folders
        WHERE   FIND_IN_SET(parent_id, @id)
        ) AS folderIds, vars.id
FROM    (
        SELECT  @id := 1 AS id
        ) vars
INNER JOIN (
    SELECT 0 AS nbr UNION ALL SELECT 1 UNION ALL SELECT 2 
 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 
 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 
 UNION ALL SELECT 9) temp
WHERE @id IS NOT NULL

This works by

  • Creating a join between a static variable subquery (SELECT @id := 1 AS id) and a static set of 10 rows in this case(maximum depth)
  • using a subquery in the select to traverse the tree and find all the parents or child nodes

The purpose of the join is to create a result set of 10 rows, so that the subquery in the select is executed 10 times.

Alternatively, if you do not know the maximum depth, you could replace the joined subquery with

INNER JOIN (
SELECT 1 FROM folder) temp

or in order to avoid all the union selects above, use with a limit:

INNER JOIN (
SELECT 1 FROM folder LIMIT 100) temp

References: - Hierarchical queries in MySQL

Jannes Botis
  • 11,154
  • 3
  • 21
  • 39