4

I have multiple categories, which can have None or one or multiple sub-categories.

The process theoretically can go to infinite. So, it is like having multiple trees.

Tree example.

A
 - A1
     - A11
     - A12
-A2
B
C
 - C1

I have also Item(s). An Item can be in multiple categories.

At this moment to connect the categories, in database I use three fields:

  • children (the children of a category),

  • path([1,4,8], basically are the ids of grandparent, parent, and category itself)

  • depth, represent the level in the tree for each category

Using this fields I avoid some recursive and using more queries.

I usually retrieve data like:

  • top categories (depth 0)

  • subcategories of a category

  • sibling categories

  • items in the category (for example a grandparent category, will show its direct items, the items of children, and the items of grandchildren)

At this moment I'm using Django(want to move to FastAPI) and PostgreSQL, and every time I have CRUD operations on categories, the three fields (path,depth,children)will be modified.

I'm thinking maybe is a better way, to maintain/retrieve the categories tree and corresponding items.

GMB
  • 216,147
  • 25
  • 84
  • 135
user3541631
  • 3,686
  • 8
  • 48
  • 115

3 Answers3

4

There are various possible strategies to store a tree in a database.

Storing full paths in an array as you currently is one of them. But with this solution is is hard to enforce referential integrity (how do you guarantee that these ids in the arrays really exist in the table?), and simple tree operations are tedious (how do you enumerate the direct children of given node?).

The answer by @VesaKarjalainen suggests using the adjacency list model, a separate table where each element refers to its immediate ancestor. It works, but has downsides: typically, is that complicated to traverse the hierarchy (like get all children or parents of a given node): you need some kind of iteration or recursion for this, which SQL engines do not do efficiently.

I would recommend the closure table approach. This works by creating a separate table that stores all possible paths in the tree, like so:

create table category_path (
    parent_id int,
    child_id int,
    level int,
    primary key(parent_id, child_id),
    foreign key(parent_id) references category(id),
    foreign key(parent_id) references category(id)
);

For this tree structure that you provided:

        A       B     C 
       / \            |
     A1   A2          C1
     /\
  A11  A12

You would store the following data:

parent_id    child_id    level
A            A           0
A            A1          1
A            A2          1
A            A11         2
A            A12         2
A1           A11         1
A1           A12         1
B            B           0
C            C           0
C            C1          1

Now, say you want to retrieve all children of a given category, that's as simple as:

select * from category_path where parent_id = 'A'

To get all the parents, you would just replace where parent_id = ... with where child_id = ....

You can bring in the main table with a join:

select c.*
from category_path cp
inner join categories c on c.id = cp.parent_id
where cp.parent_id = 'A'
GMB
  • 216,147
  • 25
  • 84
  • 135
  • The recursive query in PostgreSQL is quite efficient and in most cases the category tree will be cached. Anyway, the model you suggest or any other pre-processed format can easily be created from the basic model I gave - and there is no risk of corrupt links when the tree is edited. Also, making this expanded version manually would be quite error prone - you need a program for that anyway. – Vesa Karjalainen Jan 11 '20 at 17:40
3

Use a recursive CTE query to make a hierarchy tree. Depending on your hierarchy size and typical queries indexing and automatic caching might be enough to make this fast enough. Otherwise, a materialized view might be a good approach.

You can choose to use a separate TOP node if you need it, or have top level nodes have NULL parent. Having several nodes like TOP makes it possible to have several trees in the same table. Also, making a query for a single downstream node and upwards shouldn't be difficult.

DROP TABLE IF EXISTS category;

CREATE TABLE category (
    id varchar PRIMARY KEY,
    parent varchar
);

COPY category (id,parent)
FROM  stdin WITH DELIMITER ';';
TOP;\N
1;TOP
2;TOP
1A;1
1B;1
1A1;1A
1A2;1A
\.

WITH RECURSIVE tree AS (
  SELECT
    id,
    parent,
    id  AS path
  FROM
    category
  WHERE
    parent IS NULL
UNION
  SELECT
    c.id,
    c.parent,
    p.path || ' -> ' || c.id
  FROM
    category c
  INNER JOIN
    tree p
   ON c.parent = p.id
  )

SELECT * FROM tree
ORDER BY path;
Vesa Karjalainen
  • 1,087
  • 8
  • 15
1

If you are planning on sticking with django for your project, and would like something a little more "out of the box", you should take a look at django-treebeard. This is utilized in large python projects that require tree structures in the database like Wagtail.

MrName
  • 2,363
  • 17
  • 31