0

I have a working query that seems awfully inefficient; I'm wondering if I'm missing a simple way to improve it.

Simple table:

id  date        master_id
-------------------------
1   2015-02-01  0
2   2015-02-02  0
3   2015-02-03  0
4   2015-02-04  1
5   2015-02-02  1
6   2015-02-17  1
7   2015-02-27  1
8   2015-01-01  1

Objective: Get all rows where the master_id is zero, OR the master_id is not zero and no other rows of the same master_id have an earlier date. Order every result by date.

Current query, using a groupwise minimum subquery to create the second WHERE condition.


SELECT *
FROM `test`
WHERE `master_id` =0
OR `id` IN (

    SELECT test.`id`
    FROM (
        SELECT `master_id`, MIN(`date`) AS mindate
        FROM `test`
        WHERE `master_id`  0       
        GROUP BY `master_id`
    ) AS x
    INNER JOIN `test` ON x.`master_id` = test.`master_id`
    AND x.mindate= test.`date`
)
ORDER BY `date`

It works, but the EXPLAIN makes it seem inefficient:

id  select_type         table       type    possible_keys   key         key_len     ref     rows    Extra
-------------------------------------------------------------------------------------------------------------
1   PRIMARY             test        ALL     NULL            NULL        NULL        NULL    8       Using where; Using filesort
2   DEPENDENT SUBQUERY  derived3    system  NULL            NULL        NULL        NULL    1   
2   DEPENDENT SUBQUERY  test        eq_ref  PRIMARY         PRIMARY     4           func    1       Using where
3   DERIVED             test        ALL     NULL            NULL        NULL        NULL    8       Using where; Using temporary; Using filesort

Can I improve this? Should I break it into two queries, one for ID=0 and one for the groupwise min? Thanks in advance.

mike
  • 53
  • 6

2 Answers2

0

Avoiding the inner join can improve the query:

SELECT *
FROM `test`
WHERE `master_id` =0
OR `id` IN (
    SELECT t1.id 
    FROM (SELECT * 
        FROM test t2 
        WHERE t2.master_id!=0   
        ORDER BY t2.date ASC) t1
    GROUP BY t1.master_id
)
ORDER BY `date`;
aimstone
  • 106
  • 7
  • This looks good; will the first subquery always select the ID of the first row from the second subquery, respecting its order? When mysql groups rows without any aggregate functions, does it always just pick the first value it finds for each column that matches the group by condition? – mike Feb 12 '15 at 19:39
0

How about this...

SELECT * FROM test WHERE master_id = 0
UNION 
SELECT x.*
  FROM test x
  JOIN (SELECT master_id,MIN(date) min_date FROM test GROUP BY master_id) y
    ON y.master_id = x.master_id 
   AND y.min_date = x.date;
Strawberry
  • 33,750
  • 13
  • 40
  • 57