0

Here is a set of tables describing music composers :

CREATE TABLE IF NOT EXISTS `compositors` (
`id` int(11) NOT NULL,
  `name` varchar(45) NOT NULL COMMENT 'Nom et Prenom',
  `birth_date` varchar(45) DEFAULT NULL,
  `death_date` varchar(45) DEFAULT NULL,
  `birth_place` varchar(45) DEFAULT NULL,
  `death_place` varchar(45) DEFAULT NULL,
  `gender` enum('M','F') DEFAULT NULL,
  `century` varchar(45) DEFAULT NULL,
  `country` int(11) DEFAULT NULL
) ENGINE=InnoDB AUTO_INCREMENT=28741 DEFAULT CHARSET=latin1;


CREATE TABLE IF NOT EXISTS `compositor_biography` (
`index` int(11) NOT NULL,
  `compositor_id` int(11) NOT NULL,
  `url` varchar(255) DEFAULT NULL
) ENGINE=InnoDB AUTO_INCREMENT=15325 DEFAULT CHARSET=latin1;


CREATE TABLE IF NOT EXISTS `compositor_comments` (
  `compositor_id` int(11) NOT NULL,
  `comment` text NOT NULL,
  `public` enum('Publique','Privé') NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;


CREATE TABLE IF NOT EXISTS `compositor_country` (
  `compositor_id` int(11) NOT NULL,
  `country_id` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Here are my indexes :

--
-- Index pour la table `compositors`
--
ALTER TABLE `compositors` ADD PRIMARY KEY (`id`), ADD KEY `countries` (`country`);
ALTER TABLE `compositor_biography` ADD PRIMARY KEY (`index`), ADD KEY `index` (`compositor_id`);
ALTER TABLE `compositor_comments` ADD KEY `c_compositor_idx` (`compositor_id`);

And finally sample data :

INSERT INTO `compositors` (`id`, `name`, `birth_date`, `death_date`, `birth_place`, `death_place`, `gender`, `century`, `country`) VALUES
(1, 'Dummy Compositor', '1606', '1676', 'Bruxellesss', NULL, 'F', '17', 11);

INSERT INTO `compositor_biography` (`index`, `compositor_id`, `url`) VALUES
(15322, 1, 'Dummy Link 1'),
(15323, 1, 'Dummy Link 2'),
(15324, 1, 'Dummy Link 3');

INSERT INTO `compositor_comments` (`compositor_id`, `comment`, `public`) VALUES
(1, 'Dummy Comment', 'Privé');

Here is an example query that my PHP script generate :

SELECT DISTINCT compositors.id, compositors.name, compositors.birth_date, compositors.death_date, compositors.birth_place, compositors.death_place, compositors.gender, compositors.century, compositors.country,  
GROUP_CONCAT( compositor_biography.url SEPARATOR ';') AS concat_compositor_biography_url, 
GROUP_CONCAT( compositor_comments.comment SEPARATOR ';') AS concat_compositor_comments_comment, 
GROUP_CONCAT( compositor_comments.public + 0 SEPARATOR ';') AS concat_compositor_comments_public 
FROM compositors 
LEFT JOIN compositor_biography ON compositors.id = compositor_biography.compositor_id 
LEFT JOIN compositor_comments ON compositors.id = compositor_comments.compositor_id 
GROUP BY compositors.id

However, this one has a problem, if you execute this query, you can see that in the concat_compositor_comments_comment column, you have that result :

Dummy Comment;Dummy Comment;Dummy Comment

but there is only one actual comment.

I didn't really understand what was the problem there, but it seemed like it was the GROUP BY. It should have one GROUP BY per JOIN - according to the second answer at Multiple GROUP_CONCAT on different fields using MySQL -- so I did it, and it worked, with this query :

SELECT DISTINCT compositors.id,
    compositors.NAME,
    compositors.birth_date,
    compositors.death_date,
    compositors.birth_place,
    compositors.death_place,
    compositors.gender,
    compositors.century,
    compositors.country,
    concat_compositor_biography_url,
    concat_compositor_comments_comment,
    concat_compositor_comments_public
FROM compositors
LEFT JOIN (
    SELECT compositor_id,
        GROUP_CONCAT(compositor_biography.url SEPARATOR ';') AS concat_compositor_biography_url
    FROM compositor_biography
    GROUP BY compositor_biography.compositor_id
    ) compositor_biography ON compositors.id = compositor_biography.compositor_id
LEFT JOIN (
    SELECT compositor_id,
        GROUP_CONCAT(compositor_comments.comment SEPARATOR ';') AS concat_compositor_comments_comment,
        GROUP_CONCAT(compositor_comments.PUBLIC + 0 SEPARATOR ';') AS concat_compositor_comments_public
    FROM compositor_comments
    GROUP BY compositor_comments.compositor_id
    ) compositor_comments ON compositors.id = compositor_comments.compositor_id

However, this query has huge performance problem, since it doesn't use INDEXES, or at least it seems to scan all the tables, and with 24000 composers, it takes approx 420 second for that query, while the other ( which gives wrong results on GROUP BY ) takes 1 second.

How can I change the second query, so it uses correctly the index and doesn't scan all the tables ?

Here is a link to a SQL-Fiddle the database schema : http://sqlfiddle.com/#!2/6b0132


UPDATE

According to @phil_w, and after further testing, this query seems to work with very good performance :

SELECT a.id,
    a.name,
    a.concat_compositor_biography_url,
    b.concat_compositor_aliases_data,
    GROUP_CONCAT(compositor_comments.comment SEPARATOR ';') as concat_compositor_comments_comment,
    GROUP_CONCAT(compositor_comments.public + 0 SEPARATOR ';') as concat_compositor_comments_public
FROM (
    SELECT b.id,
    b.name,
    b.concat_compositor_biography_url,
    GROUP_CONCAT(compositor_aliases.data SEPARATOR ';') as concat_compositor_aliases_data
    FROM (
        SELECT compositors.id,
            compositors.name,
            GROUP_CONCAT(compositor_biography.url SEPARATOR ';') AS concat_compositor_biography_url
        FROM compositors
        LEFT JOIN compositor_biography ON compositors.id = compositor_biography.compositor_id
        GROUP BY compositors.id
    ) b
    LEFT JOIN compositor_aliases ON b.id = compositor_aliases.compositor_id
    GROUP BY b.id
) a
LEFT JOIN compositor_comments ON a.id = compositor_comments.compositor_id
GROUP BY a.id  

However, how would it be possible to have the same result in a more compact query ? ( by the way, shall I create a new question for that and make this one resolved ? )

Community
  • 1
  • 1

2 Answers2

1

This question has nothing to do with "indexes". The problem is that you have two joins and every combination of rows will be returned (ie you have 3 matching rows in the other join to compositor_biography).

The fix is simple - just add DISTINCT to the GROUP_CONCAT() function:

...
GROUP_CONCAT( DISTINCT compositor_comments.comment SEPARATOR ';') AS concat_compositor_comments_comment, 
...
Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • Won't work because for example, `compositor_comments.public` is an enum, so if I have 3 comments that are public, I will have only one value in my column `concat_compositor_comments_public` while the correct response would be "1;1;1" which isn't possible due to the DISTINCT keyword – Aymeric Joly Apr 10 '15 at 13:04
0

It's normal that you have the entry 3 times, because you have 3 rows in compositor_biography...

perhaps you could go step by step, first gathering the bio only:

  SELECT compositors.id, compositors.name, 
    GROUP_CONCAT( compositor_biography.url SEPARATOR ';') AS   concat_compositor_biography_url
    FROM compositors 
    LEFT JOIN compositor_biography ON compositors.id =   compositor_biography.compositor_id  
    GROUP BY compositors.id

then join the rest

select t.id, t.name,t.concat_compositor_biography_url,
GROUP_CONCAT( compositor_comments.comment SEPARATOR ';') AS concat_compositor_comments_comment
from (
  SELECT compositors.id, compositors.name, 
  GROUP_CONCAT( compositor_biography.url SEPARATOR ';') AS concat_compositor_biography_url
  FROM compositors 
  LEFT JOIN compositor_biography ON compositors.id = compositor_biography.compositor_id 
  GROUP BY compositors.id
) t
LEFT JOIN compositor_comments ON t.id = compositor_comments.compositor_id  

and so on...

I don't see why it would not use the index unless the table is small. Try 'explain select...' to confirm that.

phil_w
  • 1,204
  • 11
  • 8
  • It is missing a GROUP BY instruction at the end, after the LEFT JOIN, it needs a GROUP BY t.id else it doesn't work. – Aymeric Joly Apr 10 '15 at 12:43