2

I would like to get rid of duplicates in my DB. There can be several duplicates of one criterion, which are then grouped together.

Let's say B is duplicate of A, and C is also duplicate of A then there should be a result like

*id* | *duplicate*     
  A  | B, C

But now the result is like:

*id* | *duplicate* 
  A  | B, C    
  B  | C

Which is correct of course. The problem is, that I would like that ids which already appeared as duplicates in the results were not listed again in the column id with their own duplicates.

Here is an example: http://sqlfiddle.com/#!9/61692/1/0

Any suggestions?

Thanks, Paul

Edit:

And here the source of the example (as recommended by Zohar Peled):

CREATE TABLE duplicates
    (`id` int, `Name` varchar(7))
;

INSERT INTO duplicates
    (`id`, `Name`)
VALUES
    (1, 'Bob'),
    (2, 'Bob'),
    (3, 'Bob'),
    (4, 'Alice')
;

SELECT DISTINCT d1.`id`, GROUP_CONCAT(d2.`id`) as duplicates
FROM `duplicates` as d1, `duplicates` as d2 
WHERE 
d1.`id`< d2.`id` AND
d1.`Name`       = d2.`Name`   
GROUP BY d1.`id`
Strawberry
  • 33,750
  • 13
  • 40
  • 57
  • 1
    Creating an example on sqlfiddle is great, but what if ir's down? I suggest you copy the ddl and dml here as well. – Zohar Peled Apr 30 '15 at 18:59

1 Answers1

1

This is a rather unorthodox solution, but hey...

SELECT MIN(x.id) id
     , GROUP_CONCAT(DISTINCT y.id) duplicates
  FROM duplicates x 
  JOIN duplicates y
    ON y.name = x.name 
   AND y.id > x.id
 GROUP
    BY x.name
Strawberry
  • 33,750
  • 13
  • 40
  • 57
  • Well this looks good! As explanation for others: The main "tricks" seem to be: the MIN() which makes sure there are no duplicates like: id | duplicates A | B, C, D B | A, C, D And the GROUP BY which may NOT be id, but one of the search criteria. Thank you! –  May 01 '15 at 09:22
  • And a link to another successful example: http://sqlfiddle.com/#!9/3c592/7/0 With Strawberry´s solution –  May 01 '15 at 09:32