MySQL distinct query returns rows with duplicate information, need deduplication

Question

I have a table similar to the one shown below in a MySQL database:

+----------+----------+----------+----------+----------+  
| Column_A | Column_B | Column_C | Column_D | Column_E |      
+----------+----------+----------+----------+----------+      
|        1 |       11 | a        |        0 | abc      |      
|        2 |       22 | a        |        0 | abc      |      
|        3 |       33 | a        |        0 | def      |      
|        4 |       44 | b        |        0 | def      |      
|        5 |          | b        |        0 | def      |      
|        6 |       55 | c        |        0 | ghi      |      
|        7 |          | d        |        0 | jkl      |      
|        8 |          | a        |        4 | abc      |      
|        9 |          | a        |        4 | abc      |      
|       10 |          | b        |        4 | abc      |      
|       11 |       88 | f        |        4 | xyz      |      
|       12 |          | f        |        4 | xyz      |      
+----------+----------+----------+----------+----------+

I need a result similar to the one below (i.e only a & b values have different column D & E values):

+----------+----------+----------+
| Column_C | Column_D | Column_E |
+----------+----------+----------+
| a        |        0 | abc      |
| a        |        0 | def      |
| a        |        4 | abc      |
| b        |        0 | def      |
| b        |        4 | abc      |
+----------+----------+----------+

I have tried this query:

SELECT DISTINCT column_c,column_d,column_e FROM trial2 ORDER BY column_c;

I get this:

+------------------+------------------+------------------+
|     column_c     |     column_d     |     column_e     |
+------------------+------------------+------------------+
|     a            |            0     |     abc          |
|     a            |            0     |     def          |
|     a            |            4     |     abc          |
|     b            |            0     |     def          |
|     b            |            4     |     abc          |
|     c            |            0     |     ghi          |
|     d            |            0     |     jkl          |
|     f            |            4     |     xyz          |
+------------------+------------------+------------------+

I do not need the rows with 'c', 'd' or 'f' in column_c. I need rows which have both 0 & 4 values in column_d (i.e. column_c is 'a' or 'b').

Show your working, include the queries you have tried in an edit to the question — random, Nov 29 '11 at 04:31
Don't forget to register on SO with the same details you have on Super User — random, Nov 29 '11 at 04:55

score 1 · Answer 1 · answered Nov 29 '11 at 08:25

You don't need to join...

SELECT column_c,column_d,column_e FROM trial2 
GROUP by column_c, column_d, column_e 
HAVING count (*) > 1 
ORDER BY column_c

The having clause operates after the aggregate is applied, so you can filter on how many rows are left after grouping...

outis · Answer 2 · 2011-11-29T05:39:23.233

0

DISTINCT merely ensures a row appears at most once in the output. It won't remove rows that don't exactly match other rows.

To operate on more than one row at once, you'll need an inner join:

SELECT t.C, t.D, t.E
  FROM trial2 AS t
    JOIN trial2 AS tb
      ON t.C=tb.C AND (t.D != tb.D OR t.E != t.E)
  GROUP BY t.C, t.D, t.E
  ORDER BY t.C;

An inner join filters out rows that don't have a matching row. In the above query, matching rows are those that have the same value for column C, but differ in column D or E.

edited Nov 29 '11 at 05:39

answered Nov 29 '11 at 05:21

outis

75,655
22
151
221

@Prasad: more importantly, you should understand why it works. Understanding joins is crucial for working with relational databases, but is beyond the scope of this question. Find a good book or website that covers relational databases; there are a couple SO Q&As that have suggestions. – outis Nov 29 '11 at 06:53

score 0 · Answer 3 · answered Nov 29 '11 at 05:27

Not too sure about this solution, but I think it does what you want.

mysql> select * from randdata;
+------+------+------+
| a    | b    | c    |
+------+------+------+
| a    | 0    | f    |
| a    | 2    | x    |
| b    | 2    | x    |
| c    | 0    | f    |
+------+------+------+
4 rows in set (0.00 sec)

mysql> select * from randdata GROUP BY concat(b,c);
+------+------+------+
| a    | b    | c    |
+------+------+------+
| a    | 0    | f    |
| a    | 2    | x    |
+------+------+------+
2 rows in set (0.01 sec)

Query:

select * from trial2 GROUP BY concat(column_d,column_e);

For the sample, `GROUP BY b,c` should do the same thing as `GROUP BY concat(b,c)`. More generally, the latter will group things together that the former won't. Lastly, `GROUP BY` suffers the same problem as `DISTINCT` regarding Prasad's requirements: neither will remove rows that always have the same values for columns D and E for a given value in column C. — outis, Nov 29 '11 at 05:47

MySQL distinct query returns rows with duplicate information, need deduplication

3 Answers3