1

This question is about the preferred way to retrieve grouped data from a database to an application. I have a schema with tables in a 1:n relationship. For example let's say it's owner:pet:

create table `owner` (
  `id` int(11) not null auto_increment,
  `name` varchar(20), primary key(`id`)
);
create table `pet`   (
  `id` int(11) not null auto_increment,
  `owner_id` int(11) not null,
  `name` varchar(20),
  primary key(`id`)
);

insert into owner (`name`)
  values ('alice'), ('bob'), ('clarice');
insert into pet (`owner_id`, `name`)
  values (1, 'fifi'), (1, 'mittens'),
         (2, 'barky'), (3, 'snowflake'),
         (3, 'max'), (3, 'tweety');

My application needs to know all owners and their pets. I've considered accomplishing this in three ways:

  • group by. My application needs to split the resulting data, and I have to be careful about choosing a delimiter that is not present in my values
select o.id, o.name, group_concat(p.name)
from owner o join pet p on o.id = p.owner_id group by o.id;
  • no group by. This avoids the need for splitting the pet names, but now I have to aggregate the data myself in the application layer. And if there's a lot of data being retrieved (e.g. much more data from the owner table) then I'm fetching a lot of redundant data.
select o.id, o.name, p.name
from owner o join pet p on o.id = p.owner_id;
  • multiple queries. First query to get owner IDs, then query pet names, and reconstruct data in my application.
select id, name from owner;
select owner_id, name from pet
where owner_id in (select owner_id from owner);

All of these have drawbacks. In reality my data has millions of records, and the equivalent of the 'owner' table above is a couple kb of data, so there is a very real cost to retrieving redundant data.

Is there a "BKM" for this scenario? My inclination is to go with the group by, and just accept that I have to choose a magic token for the delimiter, but wondering if I'm missing a better way to do this.

srh
  • 67
  • 1
  • 6
  • 1
    I generally recommend the second method. Use `ORDER BY o.id` so that all the rows in the group will be consecutive. You can then just check whether the current row has the same ID as the previous row to know whether to start a new group in the output. – Barmar Mar 19 '21 at 19:53
  • See https://stackoverflow.com/questions/27575562/how-can-i-list-has-same-id-data-with-while-loop-in-php/27575685#27575685 for an example – Barmar Mar 19 '21 at 19:53
  • I can't speak for 'BKM', but personally I think it's daft to 'stringify' atomic data, only to parse it again in application code. – Strawberry Mar 19 '21 at 19:54

1 Answers1

1

If you are concerned about the "delimiter", then use JSON:

select o.id, o.name,
       json_arrayagg(p.name)
from owner o join
     pet p
     on o.id = p.owner_id
group by o.id;

The JSON format should handle the delimiters okay. You will need to invoke a JSON parser in your application, though, to read the data correctly.

In practice, though, I would say that group_concat() would be a common choice, using a delimited such as '|' or '^'.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks, the JSON grouper definitely seems like the "right" answer. Unfortunately it's not supported on the version of MariaDB we're using. I'll just fall back to group_concat and accept the inelegance of it, but I'll keep this in my back-pocket for the future. – srh Mar 22 '21 at 05:53