8

Database schema

create table `questions` (
  `id` int not null auto_increment,
  `title` varchar(45) not null,
  primary key (`id`));

create table `tags` (
  `id` int not null auto_increment,
  `question_id` int not null,
  `name` varchar(45) not null,
  primary key (`id`));

create table `comments` (
  `id` int not null auto_increment,
  `question_id` int not null,
  `body` varchar(45) not null,
  primary key (`id`));

insert into questions (title) values
("title1"), ("title2"), ("title3");

insert into tags (question_id, name) values
(1, "javascript"), (1, "php"), (1, "c#"), (2, "mysql"), (2, "php"), (3, "c#");

insert into comments (question_id, body) values
(1, "comment1"), (1, "comment1"), (1, "comment2"), (3, "comment3");

That's how it looks visually:

questions table

| id |  title |
|----|--------|
|  1 | title1 |
|  2 | title2 |
|  3 | title3 |

tags table

| id | question_id |       name |
|----|-------------|------------|
|  1 |           1 | javascript |
|  2 |           1 |        php |
|  3 |           1 |         c# |
|  4 |           2 |      mysql |
|  5 |           2 |        php |
|  6 |           3 |         c# |

comments table

| id | question_id |     body |
|----|-------------|----------|
|  1 |           1 | comment1 |
|  2 |           1 | comment1 |
|  3 |           1 | comment2 |
|  4 |           3 | comment3 |

Each question has to have at least one tag. It can also have 0 or more comments. There can be two comments with the same body on one question.

Desired output

I want to select all questions, that is, their ids, titles, tags and comments.

The output should look like that:

| id |  title |       tags        |          comments          |
|----|--------|-------------------|----------------------------|
|  1 | title1 | c#,php,javascript | comment1,comment1,comment2 |
|  2 | title2 | php,mysql         | (null)                     |
|  3 | title3 | c#                | comment3                   |

Attempts to solve the problem

I tried the following query:

select questions.id, questions.title,
  group_concat(tags.name), group_concat(comments.body)
from questions
join tags on questions.id = tags.question_id
left join comments on questions.id = comments.question_id
group by questions.id

Unfortunately, it doesn't work as expected. It produces the following output:

| id |  title | group_concat(distinct tags.name) |                                                      group_concat(comments.body) |
|----|--------|----------------------------------|----------------------------------------------------------------------------------|
|  1 | title1 |                c#,php,javascript | comment1,comment1,comment1,comment2,comment2,comment2,comment1,comment1,comment1 |
|  2 | title2 |                        php,mysql |                                                                           (null) |
|  3 | title3 |                               c# |                                                                         comment3 |

As you see, for the first question I get each comment three times, because there are three tags on this question.

Also, the comments are in the wrong order. They should be in the same order as they were inserted, that is, comment1,comment1,comment2, not comment1,comment2,comment1.

I can't use distinct for comments, as there can be multiple comments with the same body on one question.

I know that this probably could be solved with nested selects, but as far as I know it would have a huge negative impact on the performance of the query.

SQL Fiddle

The SQL Fiddle with database schema and my query.

Michał Perłakowski
  • 88,409
  • 26
  • 156
  • 177

2 Answers2

11

You need to first aggregate and apply GROUP_CONCAT and then join:

select questions.id, questions.title,
       tags.name, comments.body
from questions
join (
   select question_id, group_concat(tags.name) as name
   from tags
   group by question_id
) tags on questions.id = tags.question_id
left join (
   select question_id, group_concat(comments.body) as body
   from comments
   group by question_id
) comments on questions.id = comments.question_id
Giorgos Betsos
  • 71,379
  • 9
  • 63
  • 98
  • 1
    Additionally GROUP_CONCAT can take an ORDER BY, e.g. `group_concat(comments.body order by comments.id)` – Caleth May 25 '16 at 09:01
  • Such an epic, yet simple solution that beats the issue of `sql_mode=only_full_group_by` in mysql! Thanks a ton for sharing! – Devner Sep 05 '19 at 13:20
1

You can aggregate using a sub query before the join. As you appear to have unique tags then it would appear you can avoid using a sub query for the tags, and just join those as you currently do:-

SELECT questions.id, 
        questions.title,
        GROUP_CONCAT(tags.name ORDER BY tags.id), 
        comments.body
FROM questions
LEFT OUTER JOIN tags ON questions.id = tags.question_id
LEFT OUTER JOIN 
(
   SELECT question_id, 
            GROUP_CONCAT(comments.body ORDER BY id) as body
   FROM comments
   GROUP BY question_id
) comments ON questions.id = comments.question_id
GROUP BY questions.id, 
        questions.title,
        comments.body

Possible that you might get away with a correlated sub query. This might be better it you have a very large number of questions, but will be limiting those you are interested in with a WHERE clause. Down side is I am not sure if MySQL will be clever enough to perform the correlated sub query once for each question, rather than once for each occurrence of the question.

SELECT questions.id, 
        questions.title,
        GROUP_CONCAT(tags.name ORDER BY tags.id), 
        (
            SELECT GROUP_CONCAT(comments.body ORDER BY id) 
            FROM comments
            WHERE questions.id = comments.question_id
            GROUP BY question_id
        ) AS body
FROM questions
LEFT OUTER JOIN tags ON questions.id = tags.question_id
GROUP BY questions.id, 
        questions.title,
        body
Kickstart
  • 21,403
  • 2
  • 21
  • 33
  • I don't think that tags should be `left join`, as each question has at least one tag. Also, why should I group by `questions.title` and `questions.body`? – Michał Perłakowski May 25 '16 at 09:30
  • If it HAS to have at least one tag then you can use an INNER JOIN. As to the GROUP BY, while MySQL would normally accept just doing a _GROUP BY questions.id_ , this does not comply with SQL standards (and would fail if you used this in other flavours of SQL), and depending on the configuration of MySQL it would fail (it is an option that MySQL can keep to the GROUP BY standards). – Kickstart May 25 '16 at 09:35