How to JOIN tables without extra duplicates in multiple one-to-many relationship

Question

I have two child tables with a one-to-many relationship to the parent table. I want to join them without extra duplication.

In the actual schema, there are more one-to-many relationships to this parent and to child tables. I'm sharing a simplified schema to make the root of the problem to be easy to be seen.

Any suggestion is highly appreciated.

CREATE TABLE computer (
    id SERIAL PRIMARY KEY,
    name TEXT
);

CREATE TABLE c_user (
    id SERIAL PRIMARY KEY,
    computer_id INT REFERENCES computer,
    name TEXT
);

CREATE TABLE c_accessories (
    id SERIAL PRIMARY KEY,
    computer_id INT REFERENCES computer,
    name TEXT
);

INSERT INTO computer (name) VALUES ('HP'), ('Toshiba'), ('Dell');
INSERT INTO c_user (computer_id, name) VALUES (1, 'John'), (1, 'Elton'), (1, 'David'), (2, 'Ali');
INSERT INTO c_accessories (computer_id, name) VALUES (1, 'mouse'), (1, 'keyboard'), (1, 'mouse'), (2, 'mouse'), (2, 'printer'), (2, 'monitor'), (3, 'speaker');

This is my query:

SELECT 
    c.id
    ,c.name
    ,jsonb_agg(c_user.name)
    ,jsonb_agg(c_accessories.name)
FROM 
    computer c
JOIN 
    c_user ON c_user.computer_id = c.id
JOIN 
    c_accessories ON c_accessories.computer_id = c.id
GROUP BY c.id

I'm getting this result:

1   "HP"    ["John", "John", "John", "Elton", "Elton", "Elton", "David", "David", "David"]  ["mouse", "keyboard", "mouse", "mouse", "keyboard", "mouse", "mouse", "keyboard", "mouse"]
2   "Toshiba"   ["Ali", "Ali", ""Ali"]  ["monitor", "printer", "mouse"]

I want to get this result (by preserving duplicates if exist in database). And also be able to filter computers by user and/or accessory:

1 "HP" ["John", "Elton", "David"] ["keyboard", "mouse", "mouse"]
2 "Toshiba" ["Ali"] ["monitor", "printer", "mouse"]
3 "Dell" Null ["speaker"]

score 3 · Answer 1 · answered Jan 20 '22 at 21:27

3

Use subqueries instead of joins:

SELECT 
    c.id,
    c.name,
    (SELECT
        jsonb_agg(c_user.name)
        FROM c_user
        WHERE c_user.computer_id = c.id
    ) AS user_names,
    (SELECT
        jsonb_agg(c_accessories.name)
        FROM c_accessories
        WHERE c_accessories.computer_id = c.id
    ) AS accessory_names
FROM 
    computer c

answered Jan 20 '22 at 21:27

Bergi

630,263
148
957
1,375

Thank you for the fast response! Wouldn't that be inefficient if I do this over and over again, because I have much more of this type of relationship in my schema? Maybe there is another way of doing this without subqueries? I just want to know if this is the only way of overcoming this problem. – Ulvi Jan 20 '22 at 21:33
No, what would be inefficient about it? – Bergi Jan 20 '22 at 21:56
@Ulvi Sure, doing anything over and over will be inefficient. You can fix that by not doing it over and over. This looks like the kind of report you need to run once a month or so. Or you could turn it into a materialized view. – jjanes Jan 21 '22 at 00:20

sticky bit · Answer 2 · 2022-01-20T22:54:59.807

Join the users to a derived table that does the joining and aggregation of computers and accessories and aggregate again.

SELECT ca.id,
       jsonb_agg(u.name) AS users,
       ca.accessories
       FROM (SELECT c.id,
                    jsonb_agg(a.name) AS accessories
                    FROM computer AS c
                         LEFT JOIN c_accessories AS a
                                   ON a.computer_id = c.id
                    GROUP BY c.id) AS ca
            INNER JOIN c_user AS u
                       ON u.computer_id = ca.id
       GROUP BY ca.id,
                ca.accessories;

You could also first aggregate including the IDs of users and accessories, so that you can use DISTINCT in the aggregation function, for example into arrays of records. Reaggrete to JSON in subqueries.

SELECT c.id,
       (SELECT jsonb_agg(x.name)
               FROM unnest(array_agg(DISTINCT row(u.id, u.name))) AS x
                                                                     (id integer,
                                                                      name text)) AS users,
       (SELECT jsonb_agg(x.name)
               FROM unnest(array_agg(DISTINCT row(a.id, a.name))) AS x
                                                                     (id integer,
                                                                      name text)) AS accessories
       FROM computer AS c
            LEFT JOIN c_accessories AS a
                      ON a.computer_id = c.id
            INNER JOIN c_user AS u
                       ON u.computer_id = c.id
       GROUP BY c.id;

db<>fiddle

score 1 · Answer 3 · answered Jan 21 '22 at 11:08

1

Aggregate first, then join to the computer table to the result of the aggregation.

select c.id, c.name, 
       cu.users,
       ca.accessories
from computer c 
  left join (
    select computer_id, jsonb_agg(name) as users
    from c_user
    group by computer_id
  ) as cu on cu.computer_id = c.id
  left join (
    select computer_id, jsonb_agg(name) as accessories
    from c_accessories
    group by computer_id
  ) as ca on ca.computer_id = c.id

answered Jan 21 '22 at 11:08

With this approach how can I filter `computer`s by `user` and/or `accessories` name, efficiently (without `HAVING` clause)? – Ulvi Jan 21 '22 at 11:12
Well you can use a [JSON operator](https://www.postgresql.org/docs/current/functions-json.html) on the aggregate to do so, e.g.: `where cu.users ? 'John'` but using a `having` is probably more efficient. Why would you want to avaoid that? – Jan 21 '22 at 11:25
In the original schema, there are more columns (with foreign keys as well). In that case, the `users` and `accessories` columns eventually will become a list of `JSONB` objects (in some cases, nested). And I read that the `HAVING` clause is inefficient for filtering the list of `JSONB` objects and it is better to avoid that. That's why I thought it would be better to avoid it. – Ulvi Jan 21 '22 at 12:04
BTW, there is an answer which uses a different approach to avoid the `HAVING` clause to a related question that was asked from me as well. Do you think that approach could be more efficient in this context? Sorry for my silly question. I just want to understand which way is better to approach filtering in these types of cases as I face a lot. There are multiple ways of building queries for this case and filtering them makes it more complicated. – Ulvi Jan 21 '22 at 12:04

How to JOIN tables without extra duplicates in multiple one-to-many relationship

3 Answers3