Finding "duplicate" rows that differ in one column

Question

I have a table like the following in MySQL 5.1:

+--------------+----------------+------+-----+---------+----------------+
| Field        | Type           | Null | Key | Default | Extra          |
+--------------+----------------+------+-----+---------+----------------+
| log_id       | int(11)        | NO   | PRI | NULL    | auto_increment |
| date         | datetime       | NO   | MUL | NULL    |                |
| date_millis  | int(3)         | NO   |     | NULL    |                |
| eib_address  | varchar(20)    | NO   |     | NULL    |                |
| ip_address   | varchar(15)    | NO   |     | NULL    |                |
| value        | decimal(20,10) | NO   | MUL | NULL    |                |
| application  | tinyint(4)     | NO   |     | NULL    |                |
| phys_address | varchar(20)    | NO   |     | NULL    |                |
| orig_log_id  | bigint(20)     | NO   |     | NULL    |                |
+--------------+----------------+------+-----+---------+----------------+

In this table, log_id and orig_log_id are always unique. It is possible that two rows may have duplicate values for any of the other fields, though. Ignoring the *log_id fields, our problem is that two rows may be identical in all other columns, but have differing values for value. I am trying to figure out the correct SQL query to identify when two (or more) rows have identical values for date, date_millis and eib_address, but different values for value, log_id and orig_log_id. So far, I've been able to come up with a query that accomplishes the first clause in my previous sentence:

SELECT main.* 
FROM sensors_log main
INNER JOIN 
    (SELECT date, date_millis, eib_address 
    FROM sensors_log 
    GROUP BY date, date_millis, eib_address 
    HAVING count(eib_address) > 1) dupes 
ON main.date = dupes.date 
    AND main.date_millis = dupes.date_millis 
    AND main.eib_address = dupes.eib_address;

However, I can't seem to figure out when value differs. I at least know that just throwing AND main.value != dupes.value into the ON clause doesn't do it!

score 7 · Accepted Answer · answered Mar 23 '11 at 16:10

7

I think it's a bit simpler than you're trying to make it. Try this:

SELECT *
  FROM SENSORS_LOG s1
  INNER JOIN SENSORS_LOG s2
    ON (s2.DATE = s1.DATE AND
        s2.DATE_MILLIS = s1.DATE_MILLIS AND
        s2.EIB_ADDRESS = s1.EIB_ADDRESS)
  WHERE s1.VALUE <> s2.VALUE OR
        s1.LOG_ID <> s2.LOG_ID OR
        s1.ORIG_LOG_ID <> s2.ORIG_LOG_ID;

Share and enjoy.

answered Mar 23 '11 at 16:10

Bob Jarvis - Слава Україні

48,992
9
77
110

This is more or less what I need. I'd rather not have the full contents of both tables, but it'll do the trick. Thanks. – GarlicFries Mar 23 '11 at 16:47
That makes it so simple! Just what I was looking for, as well. I obviously selected only the columns I cared about, but the universality is great in this solution. – missscripty Jul 07 '16 at 17:05

score 1 · Answer 2 · answered Mar 23 '11 at 16:37

Maybe I mistook the problem, but can't you just perform a COUNT like this?

SELECT date, date_millis, eib_address, count(*) as nr_dupes
FROM sensors_log
GROUP BY date, date_millis, eib_address
HAVING count(*) > 1

or

SELECT date, date_millis, eib_address, 
       group_concat(value), group_concat(log_id), group_concat(orig_log_id)
FROM sensors_log
GROUP BY date, date_millis, eib_address
HAVING count(*) > 1

Finding "duplicate" rows that differ in one column

2 Answers2