This question is about the preferred way to retrieve grouped data from a database to an application. I have a schema with tables in a 1:n relationship. For example let's say it's owner:pet:
create table `owner` (
`id` int(11) not null auto_increment,
`name` varchar(20), primary key(`id`)
);
create table `pet` (
`id` int(11) not null auto_increment,
`owner_id` int(11) not null,
`name` varchar(20),
primary key(`id`)
);
insert into owner (`name`)
values ('alice'), ('bob'), ('clarice');
insert into pet (`owner_id`, `name`)
values (1, 'fifi'), (1, 'mittens'),
(2, 'barky'), (3, 'snowflake'),
(3, 'max'), (3, 'tweety');
My application needs to know all owners and their pets. I've considered accomplishing this in three ways:
- group by. My application needs to split the resulting data, and I have to be careful about choosing a delimiter that is not present in my values
select o.id, o.name, group_concat(p.name)
from owner o join pet p on o.id = p.owner_id group by o.id;
- no group by. This avoids the need for splitting the pet names, but now I have to aggregate the data myself in the application layer. And if there's a lot of data being retrieved (e.g. much more data from the
owner
table) then I'm fetching a lot of redundant data.
select o.id, o.name, p.name
from owner o join pet p on o.id = p.owner_id;
- multiple queries. First query to get owner IDs, then query pet names, and reconstruct data in my application.
select id, name from owner;
select owner_id, name from pet
where owner_id in (select owner_id from owner);
All of these have drawbacks. In reality my data has millions of records, and the equivalent of the 'owner' table above is a couple kb of data, so there is a very real cost to retrieving redundant data.
Is there a "BKM" for this scenario? My inclination is to go with the group by
, and just accept that I have to choose a magic token for the delimiter, but wondering if I'm missing a better way to do this.