This select query takes about 20 seconds to complete.
select Count(*)
from products as bad_rows
inner join (
select pid, MAX(last_updated_date) as maxdate
from products
group by pid
having count(*) > 1
) as good_rows on good_rows.pid= bad_rows.pid
and good_rows.maxdate <> bad_rows.last_updated_date
where bad_rows.available = 0
The delete on the other hand is still running after 30 minutes !
delete bad_rows
from products as bad_rows
inner join (
select pid, MAX(last_updated_date) as maxdate
from products
group by pid
having count(*) > 1
) as good_rows on good_rows.pid= bad_rows.pid
and good_rows.maxdate <> bad_rows.last_updated_date
where bad_rows.available = 0
Why ?
Table Schema is as follows:
Explain for the select is as follows:
+----+-------------+------------+------+---------------+------+---------+------+-------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+-------+--------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 6253 | |
| 1 | PRIMARY | bad_rows | ALL | NULL | NULL | NULL | NULL | 34603 | Using where; Using join buffer |
| 2 | DERIVED | products | ALL | NULL | NULL | NULL | NULL | 34603 | Using temporary; Using filesort|
+----+-------------+------------+------+---------------+------+---------+------+-------+--------------------------------