9
DROP TABLE IF EXISTS `table`;
CREATE TABLE `table` (
  `id` tinyint(3) unsigned NOT NULL AUTO_INCREMENT,
  `text` text COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

INSERT INTO `table` VALUES ('1', 'Unpacked reserved sir offering bed judgment may and quitting speaking. Is do be improved raptures offering required in replying raillery. Stairs ladies friend by in mutual an no. Mr hence chief he cause. Whole no doors on hoped. Mile tell if help they ye full name. \r\n\r\nLadyship it daughter securing procured or am moreover mr. Put sir she exercise vicinity cheerful wondered. Continual say suspicion provision you neglected sir curiosity unwilling. Simplicity end themselves increasing led day sympathize yet. General windows effects not are drawing man garrets. Common indeed garden you his ladies out yet. Preference imprudence contrasted to remarkably in on. Taken now you him trees tears any. Her object giving end sister except oppose. \r\n\r\nWas justice improve age article between. No projection as up preference reasonably delightful celebrated. Preserved and abilities assurance tolerably breakfast use saw. And painted letters forming far village elderly compact. Her rest west each spot his and you knew. Estate gay wooded depart six far her. Of we be have it lose gate bred. Do separate removing or expenses in. Had covered but evident chapter matters anxious.');
INSERT INTO `table` VALUES ('2', 'Unpacked reserved sir offering bed judgment may and quitting speaking. Is do be improved raptures offering required in replying raillery. Stairs ladies friend by in mutual an no. Mr hence chief he cause. Whole no doors on hoped. Mile tell if help they ye full name. \r\n\r\nLadyship it daughter securing procured or am moreover mr. Put sir she exercise vicinity cheerful wondered. Continual say suspicion provision you neglected sir curiosity unwilling. Simplicity end themselves increasing led day sympathize yet. General windows effects not are drawing man garrets. Common indeed garden you his ladies out yet. Preference imprudence contrasted to remarkably in on. Taken now you him trees tears any. Her object giving end sister except oppose. \r\n\r\nWas justice improve age article between. No projection as up preference reasonably delightful celebrated. Preserved and abilities assurance tolerably breakfast use saw. And painted letters forming far village elderly compact. Her rest west each spot his and you knew. Estate gay wooded depart six far her. Of we be have it lose gate bred. Do separate removing or expenses in. Had covered but evident chapter matters anxious');

When running a GROUP BY query without using GROUP_CONCAT() the result set is as expected (showing two rows, one for each variation of text):

SELECT
    `text`
FROM
    `table`
GROUP BY
    `text`;

+-----------------------------------+
| text                              |
+-----------------------------------+
| Unpacked reserved sir offering... |
| Unpacked reserved sir offering... |
+-----------------------------------+
2 rows in set (0.02 sec)

However, when running the same query with a GROUP_CONCAT() the result set is not as expected (showing one row with a concatenated string of the two id fields):

SELECT
    GROUP_CONCAT(`id` SEPARATOR ', ') AS ids
FROM
    `table`
GROUP BY
    `text`;

+------+
| ids  |
+------+
| 1, 2 |
+------+
1 row in set (0.00 sec)

My question:

Why would using GROUP_CONCAT() affect the number of rows returned?

My initial assumption was that GROUP_CONCAT_MAX_LEN has something to do with it (mine is set to 1024) but surely that only affects GROUP_CONCAT(), not GROUP BY (also, as you may notice, I'm using GROUP_CONCAT() on the id field, not the text field, and the result of that doesn't even come close to exceeding GROUP_CONCAT_MAX_LEN).

Michael
  • 11,912
  • 6
  • 49
  • 64
  • I ran your SQL and I get two rows, containing ids 1 and 2 respectively. So I cannot reproduce the issue. The most likely issue (in general) is text exceeding the size of the column and being truncated. – rjh Oct 12 '12 at 16:06
  • Your `text` fields don't have the same values. – Kermit Oct 12 '12 at 16:06
  • Yes, and yet when I include a GROUP_CONCAT() I get one row rather than two in the result set. – Michael Oct 12 '12 at 16:08
  • rjh: but how would adding a function to a selected field bypass/affect the group by condition? – Michael Oct 12 '12 at 16:14
  • To your edit: I would assume that GROUP_CONCAT_MAX_LEN value does not apply to GROUP BY, and thus it returns two rows becuase it doesn't truncate them. – JanT Oct 12 '12 at 18:22
  • Then both queries should return two rows because both queries have the same `GROUP BY` condition. And yet the second doesn't. – Michael Oct 12 '12 at 19:07
  • As an alternative (or maybe bug fix) you can use a MD5 Hash `MD5( text )` for the grouping part – Sir Rufo Jan 04 '13 at 15:02
  • Heh. Clever. Just so long as there're no collisions. – Michael Jan 04 '13 at 15:07

2 Answers2

5

You have to change the max_sort_length to higher number session wise or globally as per your need. By default its value is 1024 bytes and your string contains 1170 bytes data. By increasing the size it will give two rows for GROUP_CONCAT.

Check this link max_sort_length

SELECT `text` FROM `table` GROUP BY `text`;

SET SESSION max_sort_length = 2000;
SELECT GROUP_CONCAT(`id` SEPARATOR ', ') AS ids FROM `table` GROUP BY `text`;

Check the SQL FIDDLE DEMO

EDIT: BLOB and TEXT values can't reliably be used in GROUP BY, ORDER BY or DISTINCT. Only the first max_sort_length bytes are used when comparing BLOB values in these cases. The default value of max_sort_length is 1024 and can be changed at server start-up time or at run time.

Saharsh Shah
  • 28,687
  • 8
  • 48
  • 83
  • That does indeed solve the problem. Could you explain why the `max_sort_limit` only comes into play when using `GROUP_CONCAT`? I'd have thought that if there was a limit on the length of the data used to group a result set then it would apply regardless of what you're selecting. – Michael Jan 04 '13 at 11:36
  • @MichaelRushton Check this link http://www.linuxtopia.org/online_books/database_guides/mysql_5.1_database_reference_guide/open-bugs.html – Saharsh Shah Jan 04 '13 at 12:02
  • Thanks, although from what I understand this issue seems to imply that the first query (where I *don't* use `GROUP_CONCAT()`) will *also* return just one row -- because I'm still grouping by a field which exceeds the `max_sort_length`. – Michael Jan 04 '13 at 12:17
  • Basically, I understand why the second query returns one row; I *don't* understand why the first query returns two. – Michael Jan 04 '13 at 12:23
  • As per group concat works in mysql http://dev.mysql.com/doc/refman/5.1/en/group-by-functions.html#function_group-concat – Saharsh Shah Jan 04 '13 at 12:40
  • 1
    Group by don't uses order by but group concat uses default order by column which you use in group by. So when it uses order by it sorts the data and check for 1024 bytes data rather that checks for whole data. – Saharsh Shah Jan 04 '13 at 12:42
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/22182/discussion-between-saharsh-shah-and-michaelrushton) – Saharsh Shah Jan 04 '13 at 12:42
1

It seems you're running into MySQL's default GROUP_CONCAT_MAX_LEN. Your string has a length of 1178, which is definitely over the default value of 1024. That means, if the values differ by something later than the 1024, MySQL will simply ignore it, because the first 1024 characters are exactly identical. This is a limit on GROUP_CONCAT's behavior, not on GROUP.

You can make this bigger in the my.cnf file for MySQL.

See here for some more details:

http://www.coderanch.com/t/422632/JDBC/databases/increase-group-concat-max-len

mjuarez
  • 16,372
  • 11
  • 56
  • 73
  • Definitely right answer, and also assumption: only 1024 chars are passed to GROUP BY(by default), but you can check that by making change at beginning of the line, and results will be as you expected. – JanT Oct 12 '12 at 16:32
  • Not sure if this is quite it. If you truncate the values to below 1024 the behavior continues. – Kermit Oct 12 '12 at 16:33
  • You need not only truncate them but also make them unique.. I tried that and it retuned two lines as he wanted. – JanT Oct 12 '12 at 16:41
  • But performing a function on a selected field surely shouldn't decrease (or increase) the number of rows returned? – Michael Oct 12 '12 at 17:09
  • Also, I'm performing `GROUP_CONCAT()` on `id`, not `text`. – Michael Oct 12 '12 at 17:22