132

I have a MySQL database table with two columns that interest me. Individually they can each have duplicates, but they should never have a duplicate of BOTH of them having the same value.

stone_id can have duplicates as long as for each upsharge title is different, and in reverse. But say for example stone_id = 412 and upcharge_title = "sapphire" that combination should only occur once.

This is ok:

stone_id = 412 upcharge_title = "sapphire"
stone_id = 412 upcharge_title = "ruby"

This is NOT ok:

stone_id = 412 upcharge_title = "sapphire"
stone_id = 412 upcharge_title = "sapphire"

Is there a query that will find duplicates in both fields? And if possible is there a way to set my data-base to not allow that?

I am using MySQL version 4.1.22

Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
JD Isaacks
  • 56,088
  • 93
  • 276
  • 422

7 Answers7

253

You should set up a composite key between the two fields. This will require a unique stone_id and upcharge_title for each row.

As far as finding the existing duplicates try this:

select   stone_id,
         upcharge_title,
         count(*)
from     your_table
group by stone_id,
         upcharge_title
having   count(*) > 1
Miyagi Coder
  • 5,464
  • 4
  • 33
  • 42
  • 1
    Thank you, that does select them. Could you be so kind as to tell me how to delete duplicates (but leave 1 copy of course) THANK YOU!! – JD Isaacks Mar 13 '09 at 14:06
  • 3
    One way would be to grab all the distinct data and recreate the table. – Miyagi Coder Mar 13 '09 at 14:48
  • 1
    @John Isaacks: If there are no other fields with which you could distinguish them (i.e. all fields are duplicates), then you'll have to delete both rows & recreate one. One way would be to copy duplicates into a copy of the table, delete them from the original,& reinsert distinct rows from the copy. – P Daddy Mar 13 '09 at 23:24
  • This does not work on postgres 8.1, could someone give me a hand on that? – Lennon Dec 17 '15 at 13:20
  • great thanks, does the order that you group by matter? – Andrew Jan 27 '17 at 20:56
  • And why does adding another column, just for data selection purposes, cause no results to appear, when without the extra column it does appear? – Andrew Jan 27 '17 at 21:06
  • 1
    Nice! ☺ Is there a way to show the `id`s as well? In my case the duplicates are in `first_name` and `last_name` but the `id`s are different. In the query you typed, I can see clearly those repeated ones, but I only see one of the `id`s. How do you do to make it show the other `id`s? – Pathros Apr 26 '17 at 16:42
  • @PDaddy how would you delete the duplicates if another column was unique? Let's say in this case there is another column called "primary" and based on the ones that have a value of 1 stay and the others you might want to delete. Sorry that I am asking this after 7 years but I encountered this just now and I am in a similar situation. – Aurel Drejta Apr 23 '20 at 15:27
  • @AurelDrejta: I've tried to explain your options [in this db-fiddle](https://www.db-fiddle.com/f/oevWzhB1jGLbUKaV3oASiC/0). If this doesn't help, or if it raises more questions, I'd recommend asking a new question. – P Daddy Apr 24 '20 at 03:55
37

I found it helpful to add a unqiue index using an "ALTER IGNORE" which removes the duplicates and enforces unique records which sounds like you would like to do. So the syntax would be:

ALTER IGNORE TABLE `table` ADD UNIQUE INDEX(`id`, `another_id`, `one_more_id`);

This effectively adds the unique constraint meaning you will never have duplicate records and the IGNORE deletes the existing duplicates.

You can read more about eh ALTER IGNORE here: http://mediakey.dk/~cc/mysql-remove-duplicate-entries/

Update: I was informed by @Inquisitive that this may fail in versions of MySql> 5.5 :

It fails On MySQL > 5.5 and on InnoDB table, and in Percona because of their InnoDB fast index creation feature [http://bugs.mysql.com/bug.php?id=40344]. In this case first run set session old_alter_table=1 and then the above command will work fine

Update - ALTER IGNORE Removed In 5.7

From the docs

As of MySQL 5.6.17, the IGNORE clause is deprecated and its use generates a warning. IGNORE is removed in MySQL 5.7.

One of the MySQL dev's give two alternatives:

  • Group by the unique fields and delete as seen above
  • Create a new table, add a unique index, use INSERT IGNORE, ex:
CREATE TABLE duplicate_row_table LIKE regular_row_table;
ALTER TABLE duplicate_row_table ADD UNIQUE INDEX (id, another_id);
INSERT IGNORE INTO duplicate_row_table SELECT * FROM regular_row_table;
DROP TABLE regular_row_table;
RENAME TABLE duplicate_row_table TO regular_row_table;

But depending on the size of your table, this may not be practical

SeanDowney
  • 17,368
  • 20
  • 81
  • 90
  • 1
    True, but at least for next time you know. I had the same issue and thought it good to share with others – SeanDowney Jun 15 '12 at 18:12
  • I was only teasing about it being 3 years late. Really am glad you shared. Hence the plus 1. – JD Isaacks Jun 16 '12 at 00:43
  • I imagine this removes one of the duplicates arbitrarily so make sure there is not differing data between each row that might be useful to know or keep. – Joshua Pinter Nov 30 '12 at 00:24
  • +1 for the answer even after 2 years late. I accidentally deleted a composite key and this was a life saver. Thank you – ivcode Jun 13 '14 at 07:14
  • I have tried a few of duplication finder techniques and none of them was this simple and fast. Thank you for sharing this method. – Kristjan O. Sep 29 '14 at 23:25
  • Thanks for the useful info. Although I just ran this using phpMyAdmin and received this warning. Warning: #1681 'IGNORE' is deprecated and will be removed in a future release. – TheWebsiteGuy Jan 19 '19 at 13:13
  • Yes - I'll add a comment stating such – SeanDowney Mar 19 '19 at 19:25
11

You can find duplicates like this..

Select
    stone_id, upcharge_title, count(*)
from 
    particulartable
group by 
    stone_id, upcharge_title
having 
    count(*) > 1
Restore the Data Dumps
  • 38,967
  • 12
  • 96
  • 122
6

To find the duplicates:

select stone_id, upcharge_title from tablename group by stone_id, upcharge_title having count(*)>1

To constrain to avoid this in future, create a composite unique key on these two fields.

Ian Nelson
  • 57,123
  • 20
  • 76
  • 103
  • 1
    Thank you so much, can you please tell me how to delete all but one of the duplicates. And how do I setup a compisite key in phpmyadmin. THANK YOU!!! – JD Isaacks Mar 13 '09 at 13:45
5

Incidentally, a composite unique constraint on the table would prevent this from occurring in the first place.

ALTER TABLE table
    ADD UNIQUE(stone_id, charge_title)

(This is valid T-SQL. Not sure about MySQL.)

P Daddy
  • 28,912
  • 9
  • 68
  • 92
2

this SO post helped me, but i too wanted to know how to delete and keep one of the rows... here's a PHP solution to delete the duplicate rows and keep one (in my case there were only 2 columns and it is in a function for clearing duplicate category associations)

$dupes = $db->query('select *, count(*) as NUM_DUPES from PRODUCT_CATEGORY_PRODUCT group by fkPRODUCT_CATEGORY_ID, fkPRODUCT_ID having count(*) > 1');
if (!is_array($dupes))
    return true;
foreach ($dupes as $dupe) {
    $db->query('delete from PRODUCT_CATEGORY_PRODUCT where fkPRODUCT_ID = ' . $dupe['fkPRODUCT_ID'] . ' and fkPRODUCT_CATEGORY_ID = ' . $dupe['fkPRODUCT_CATEGORY_ID'] . ' limit ' . ($dupe['NUM_DUPES'] - 1);
}

the (limit NUM_DUPES - 1) is what preserves the single row...

thanks all

groovenectar
  • 2,828
  • 3
  • 22
  • 26
  • 5
    `ALTER IGNORE TABLE table ADD UNIQUE INDEX index_name(stone_id, charge_title)` will remove duplicate rows leaving only one unique pair. – dev-null-dweller Aug 19 '10 at 21:38
  • 1
    @dev-null-dweller - You saved me hours of dealing with this programmatically in C#. Thank you. - Working - MariaDB 10.5.16 – WLFree Oct 04 '22 at 23:09
0

This is what worked for me (ignoring null and blank). Two different email columns:

SELECT * 
FROM   members 
WHERE  email IN (SELECT soemail 
                 FROM   members 
                 WHERE  NOT Isnull(soemail) 
                        AND soemail <> ''); 
Mark Gerrior
  • 378
  • 2
  • 11