How to find duplicates in 2 columns not 1

Question

I have a MySQL database table with two columns that interest me. Individually they can each have duplicates, but they should never have a duplicate of BOTH of them having the same value.

stone_id can have duplicates as long as for each upsharge title is different, and in reverse. But say for example stone_id = 412 and upcharge_title = "sapphire" that combination should only occur once.

This is ok:

stone_id = 412 upcharge_title = "sapphire"
stone_id = 412 upcharge_title = "ruby"

This is NOT ok:

stone_id = 412 upcharge_title = "sapphire"
stone_id = 412 upcharge_title = "sapphire"

Is there a query that will find duplicates in both fields? And if possible is there a way to set my data-base to not allow that?

I am using MySQL version 4.1.22

Miyagi Coder · Accepted Answer · 2009-03-13T13:39:54.733

253

You should set up a composite key between the two fields. This will require a unique stone_id and upcharge_title for each row.

As far as finding the existing duplicates try this:

select   stone_id,
         upcharge_title,
         count(*)
from     your_table
group by stone_id,
         upcharge_title
having   count(*) > 1

edited Mar 13 '09 at 13:39

answered Mar 13 '09 at 13:20

Miyagi Coder

5,464
4
33
42

1

Thank you, that does select them. Could you be so kind as to tell me how to delete duplicates (but leave 1 copy of course) THANK YOU!! – JD Isaacks Mar 13 '09 at 14:06
3

One way would be to grab all the distinct data and recreate the table. – Miyagi Coder Mar 13 '09 at 14:48
1

@John Isaacks: If there are no other fields with which you could distinguish them (i.e. all fields are duplicates), then you'll have to delete both rows & recreate one. One way would be to copy duplicates into a copy of the table, delete them from the original,& reinsert distinct rows from the copy. – P Daddy Mar 13 '09 at 23:24
This does not work on postgres 8.1, could someone give me a hand on that? – Lennon Dec 17 '15 at 13:20
great thanks, does the order that you group by matter? – Andrew Jan 27 '17 at 20:56
And why does adding another column, just for data selection purposes, cause no results to appear, when without the extra column it does appear? – Andrew Jan 27 '17 at 21:06
1

Nice! ☺ Is there a way to show the `id`s as well? In my case the duplicates are in `first_name` and `last_name` but the `id`s are different. In the query you typed, I can see clearly those repeated ones, but I only see one of the `id`s. How do you do to make it show the other `id`s? – Pathros Apr 26 '17 at 16:42
@PDaddy how would you delete the duplicates if another column was unique? Let's say in this case there is another column called "primary" and based on the ones that have a value of 1 stay and the others you might want to delete. Sorry that I am asking this after 7 years but I encountered this just now and I am in a similar situation. – Aurel Drejta Apr 23 '20 at 15:27
@AurelDrejta: I've tried to explain your options [in this db-fiddle](https://www.db-fiddle.com/f/oevWzhB1jGLbUKaV3oASiC/0). If this doesn't help, or if it raises more questions, I'd recommend asking a new question. – P Daddy Apr 24 '20 at 03:55

SeanDowney · Answer 2 · 2019-03-19T19:40:19.223

37

I found it helpful to add a unqiue index using an "ALTER IGNORE" which removes the duplicates and enforces unique records which sounds like you would like to do. So the syntax would be:

ALTER IGNORE TABLE `table` ADD UNIQUE INDEX(`id`, `another_id`, `one_more_id`);

This effectively adds the unique constraint meaning you will never have duplicate records and the IGNORE deletes the existing duplicates.

You can read more about eh ALTER IGNORE here: http://mediakey.dk/~cc/mysql-remove-duplicate-entries/

Update: I was informed by @Inquisitive that this may fail in versions of MySql> 5.5 :

It fails On MySQL > 5.5 and on InnoDB table, and in Percona because of their InnoDB fast index creation feature [http://bugs.mysql.com/bug.php?id=40344]. In this case first run set session old_alter_table=1 and then the above command will work fine

Update - ALTER IGNORE Removed In 5.7

From the docs

As of MySQL 5.6.17, the IGNORE clause is deprecated and its use generates a warning. IGNORE is removed in MySQL 5.7.

One of the MySQL dev's give two alternatives:

Group by the unique fields and delete as seen above
Create a new table, add a unique index, use INSERT IGNORE, ex:

CREATE TABLE duplicate_row_table LIKE regular_row_table;
ALTER TABLE duplicate_row_table ADD UNIQUE INDEX (id, another_id);
INSERT IGNORE INTO duplicate_row_table SELECT * FROM regular_row_table;
DROP TABLE regular_row_table;
RENAME TABLE duplicate_row_table TO regular_row_table;

But depending on the size of your table, this may not be practical

edited Mar 19 '19 at 19:40

answered Jun 14 '12 at 22:44

SeanDowney

17,368
20
81
90

1

True, but at least for next time you know. I had the same issue and thought it good to share with others – SeanDowney Jun 15 '12 at 18:12
I was only teasing about it being 3 years late. Really am glad you shared. Hence the plus 1. – JD Isaacks Jun 16 '12 at 00:43
I imagine this removes one of the duplicates arbitrarily so make sure there is not differing data between each row that might be useful to know or keep. – Joshua Pinter Nov 30 '12 at 00:24
+1 for the answer even after 2 years late. I accidentally deleted a composite key and this was a life saver. Thank you – ivcode Jun 13 '14 at 07:14
I have tried a few of duplication finder techniques and none of them was this simple and fast. Thank you for sharing this method. – Kristjan O. Sep 29 '14 at 23:25
Thanks for the useful info. Although I just ran this using phpMyAdmin and received this warning. Warning: #1681 'IGNORE' is deprecated and will be removed in a future release. – TheWebsiteGuy Jan 19 '19 at 13:13
Yes - I'll add a comment stating such – SeanDowney Mar 19 '19 at 19:25

score 11 · Answer 3 · answered Mar 13 '09 at 13:22

11

You can find duplicates like this..

Select
    stone_id, upcharge_title, count(*)
from 
    particulartable
group by 
    stone_id, upcharge_title
having 
    count(*) > 1

answered Mar 13 '09 at 13:22

Restore the Data Dumps

38,967
12
96
122

score 6 · Answer 4 · answered Mar 13 '09 at 13:21

6

To find the duplicates:

select stone_id, upcharge_title from tablename group by stone_id, upcharge_title having count(*)>1

To constrain to avoid this in future, create a composite unique key on these two fields.

answered Mar 13 '09 at 13:21

Ian Nelson

57,123
20
76
103

1

Thank you so much, can you please tell me how to delete all but one of the duplicates. And how do I setup a compisite key in phpmyadmin. THANK YOU!!! – JD Isaacks Mar 13 '09 at 13:45

score 5 · Answer 5 · answered Mar 13 '09 at 13:36

5

Incidentally, a composite unique constraint on the table would prevent this from occurring in the first place.

ALTER TABLE table
    ADD UNIQUE(stone_id, charge_title)

(This is valid T-SQL. Not sure about MySQL.)

answered Mar 13 '09 at 13:36

P Daddy

28,912
9
68
92

1

I think that works but it wont let me do it until I remove the duplicates first. Thanks. – JD Isaacks Mar 13 '09 at 14:05

groovenectar · Answer 6 · 2011-08-16T01:54:39.180

2

this SO post helped me, but i too wanted to know how to delete and keep one of the rows... here's a PHP solution to delete the duplicate rows and keep one (in my case there were only 2 columns and it is in a function for clearing duplicate category associations)

$dupes = $db->query('select *, count(*) as NUM_DUPES from PRODUCT_CATEGORY_PRODUCT group by fkPRODUCT_CATEGORY_ID, fkPRODUCT_ID having count(*) > 1');
if (!is_array($dupes))
    return true;
foreach ($dupes as $dupe) {
    $db->query('delete from PRODUCT_CATEGORY_PRODUCT where fkPRODUCT_ID = ' . $dupe['fkPRODUCT_ID'] . ' and fkPRODUCT_CATEGORY_ID = ' . $dupe['fkPRODUCT_CATEGORY_ID'] . ' limit ' . ($dupe['NUM_DUPES'] - 1);
}

the (limit NUM_DUPES - 1) is what preserves the single row...

thanks all

edited Aug 16 '11 at 01:54

answered Aug 19 '10 at 21:34

groovenectar

2,828
3
22
26

5

`ALTER IGNORE TABLE table ADD UNIQUE INDEX index_name(stone_id, charge_title)` will remove duplicate rows leaving only one unique pair. – dev-null-dweller Aug 19 '10 at 21:38
1

@dev-null-dweller - You saved me hours of dealing with this programmatically in C#. Thank you. - Working - MariaDB 10.5.16 – WLFree Oct 04 '22 at 23:09

score 0 · Answer 7 · answered May 03 '21 at 00:02

This is what worked for me (ignoring null and blank). Two different email columns:

SELECT * 
FROM   members 
WHERE  email IN (SELECT soemail 
                 FROM   members 
                 WHERE  NOT Isnull(soemail) 
                        AND soemail <> '');

How to find duplicates in 2 columns not 1

7 Answers7

Linked

Related