Duplicated records for a column

Question

I'm trying to get duplicated values in col1 for a certain col2 value.

Suppose that I have that table:

+----+------------+----------+
| id | col1       | col2     |
+----+------------+----------+
|  1 | 5          | 2        |
|  2 | 5          | 1        |
|  3 | 8          | 4        |
|  4 | 8          | 1        |
|  5 | 8          | 3        |
|  6 | 5          | 2        |
|  7 | 2          | 3        |
|  8 | 1          | 4        |
|  9 | 2          | 2        |
| 10 | 5          | 2        |
| 11 | 5          | 3        |
| 12 | 3          | 1        |
+----+------------+----------+

My query should return these rows when col2 = 1:

+----+------------+----------+
| id | col1       | col2     |
+----+------------+----------+
|  1 | 5          | 2        |
|  6 | 5          | 2        |
| 10 | 5          | 2        |
| 11 | 5          | 3        |
|  3 | 8          | 4        |
|  5 | 8          | 3        |
+----+------------+----------+

I have tried this query and it works pretty well for me:

SELECT 
DISTINCT b.* 
FROM table a,table b 
WHERE a.col1 = b.col1 AND a.col2 = 1 AND b.col2 != 1

As you can see, DISTINCT is killing for a huge table with 100k records and it's daily growing.

I need all values so I can't use GROUP BY clause.

Looking for a better and faster solution. If its better, I can change the whole structure.

Did you maybe mean to write `WHERE a.col1 = b.col1` (with a `1` at the end), rather than `WHERE a.col1 = b.col2` like you have now? Because your current query doesn't match your sample results. — ruakh, Dec 01 '11 at 23:36
Edit your answer. When you say col2 = 1 you want to say != 1 — dani herrera, Dec 01 '11 at 23:36
Adding another compound index on `(col2, col1)` would help this, as @danihp points out. — ypercubeᵀᴹ, Dec 01 '11 at 23:45
@ypercube, at this point question is ambigus because query says 'b.col2 != 1'. May be correct and not. — dani herrera, Dec 01 '11 at 23:48
@danihp It's correct because It's fetching results from b only. — tuze, Dec 01 '11 at 23:51

score 2 · Accepted Answer · answered Dec 01 '11 at 23:36

2

SELECT a.* 
FROM table AS a 
WHERE col2 <> 1
  AND EXISTS
      ( SELECT *
        FROM table b
        WHERE b.col1 = a.col1 
          AND b.col2 = 1
      )

answered Dec 01 '11 at 23:36

ypercubeᵀᴹ

113,259
19
174
235

1

Don't forget to create an index on col2, col1 – dani herrera Dec 01 '11 at 23:41
1

I don't remember if MySQL optimize it or not, but you can replace `SELECT *` by `SELECT NULL` (or any other constant) in the subquery. – Vincent Savard Dec 01 '11 at 23:43

Duplicated records for a column

1 Answers1