How to efficiently query hierarchy data?

Question

I have a system build with micro-services (user service and device service) with hierarchy data:

Data is separated by organization. Users in OrgA can see all the data (including data in OrgB, OrgC, OrgD, OrgE, OrgF and OrgG). However, users in OrgG can only see the data in OrgG.

organization table (using nested set model + Adjacency List Model):

CREATE TABLE organization (
  id serial primary key,
  name varchar(50),
  parent_id int,
  lft int,
  rgh int
);

user table:

CREATE TABLE user (
  id serial primary key,
  name varchar(20),
  org_id int
);

device table (use org_id field to separate organization's device):

CREATE TABLE device (
  id serial primary key,
  mac varchar(20),
  org_id int
);

I created an API to use the nested set model algorithm to find children nodes:

SELECT p.id
FROM organization as p
INNER JOIN user as u ON u.org_id = p.id
INNER JOIN organization as c ON p.lft <= c.lft AND p.rgh >= c.rgh;

When I need to query the authorized device list in device micro service, I need to call the user service to get an authorized organization list first, and use this list to filter:

SELECT * FROM device WHERE org_id IN ([MY-AUTHORIZED-ORGANIZAITON-ID-LIST]);

But when I have a deep hierarchy the database has to perform an IN operator on a large set. This seems not efficient. How to implement the hierarchy data query in micro-service environment without losing performance?

score 1 · Answer 1 · answered Aug 09 '23 at 16:45

This seems not every efficient.

First of all I would argue that "seems" should be removed from you vocabulary when talking about perfromance. Do not assume, determine the potential sizes, SLAs and test.

Secondary I would be much more concerned about the joins based on the range conditions to build the organization id list then perfromance of the IN (alternatively you can consider join for example with temporary table or subquery based on the provided organization id).

PostgreSQL supports Recursive Queries and I would argue that you should use those to build the hierarchy.

For example from Nested set model Wiki page:

Performance

Queries using nested sets can be expected to be faster than queries using a stored procedure to traverse an adjacency list, and so are the faster option for databases which lack native recursive query constructs, such as MySQL 5.x.[ However, recursive SQL queries can be expected to perform comparably for 'find immediate descendants' queries, and much faster for other depth search queries, and so are the faster option for databases which provide them, such as PostgreSQL, Oracle, and Microsoft SQL Server.

Also you can consider using the ltree PostgreSQL extension:

This module implements a data type ltree for representing labels of data stored in a hierarchical tree-like structure. Extensive facilities for searching through label trees are provided.

Finally if all of those options are not good enough you can consider:

Using graph databases
Since usually adding new organizations or changing the hierarchy structure is not that frequent operation so you can materialize the current hierarchy into some other store (separate table in the database, some cache like Redis, etc.) which will contain the flattened hierarchy for all organizations and/or devices (for example table containing all device.id - organizations.id pairs) and rebuild it when something happens.

How to efficiently query hierarchy data?

1 Answers1

Performance