I cannot understand why the first query, which is using a derived table, is slower than the second one. My table:
CREATE TABLE `test` (
`someid` binary(16) NOT NULL,
`indexedcolumn1` int(11) NOT NULL,
`indexedcolumn2` int(10) unsigned NOT NULL,
`data` int(11) NOT NULL,
KEY `indexedcolumn1` (`indexedcolumn1`),
KEY `indexedcolumn2` (`indexedcolumn2`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
This table contains 4.514.856 rows
The faster query:
SELECT SUM(isSame) AS same, SUM(isDifferent) AS diff, SUM(isNotSet) AS notSet, indexedcolumn1 FROM (
SELECT
CASE WHEN t.indexedcolumn1 = t.data
THEN 1
ELSE 0
END AS isSame,
CASE WHEN t.indexedcolumn1 != t.data
THEN 1
ELSE 0
END AS isDifferent,
CASE WHEN t.data = 0
THEN 1
ELSE 0
END AS isNotSet,
indexedcolumn1
FROM
test as t
WHERE
t.indexedcolumn2 >= 10000000
)AS tempTable GROUP BY indexedcolumn1;
Result:
72 rows in set (4.70 sec)
The slower query:
SELECT
SUM(CASE WHEN t.indexedcolumn1 = t.data
THEN 1
ELSE 0
END) AS same,
SUM(CASE WHEN t.indexedcolumn1 != t.data
THEN 1
ELSE 0
END) AS diff,
SUM(CASE WHEN t.data = 0
THEN 1
ELSE 0
END) AS notSet,
indexedcolumn1
FROM
test as t
WHERE
t.indexedcolumn2 >= 10000000
GROUP BY indexedcolumn1;
Result:
72 rows in set (5.90 sec)
I thought you should avoid a derived table whenever its possible. Even EXPLAIN does not give any hint: for query1:
+----+-------------+------------+------+----------------+------+---------+------+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+----------------+------+---------+------+---------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2257428 | Using temporary; Using filesort |
| 2 | DERIVED | t | ALL | indexedcolumn2 | NULL | NULL | NULL | 4514856 | Using where |
+----+-------------+------------+------+----------------+------+---------+------+---------+---------------------------------+
for query 2:
+----+-------------+-------+-------+---------------------------+------------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------------------+------------+---------+------+---------+-------------+
| 1 | SIMPLE | t | index | indexedcolumn1,indexedcolumn2 | indexedcolumn1 | 4 | NULL | 4514856 | Using where |
+----+-------------+-------+-------+---------------------------+------------+---------+------+---------+-------------+
I also tried the tests several times, always with the same result: The first query was faster.... But why? The results are the same.
EDIT: I did a additional test: I removed the where clause. Even then I get better results for the first query (EXPLAIN):
+----+-------------+------------+------+---------------+------+---------+------+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+---------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4514856 | Using temporary; Using filesort |
| 2 | DERIVED | t | ALL | NULL | NULL | NULL | NULL | 4514856 | NULL |
+----+-------------+------------+------+---------------+------+---------+------+---------+---------------------------------+
Explain Query 2:
+----+-------------+-------+-------+---------------+------------+---------+------+---------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------------+---------+------+---------+-------+
| 1 | SIMPLE | t | index | indexedcolumn1 | indexedcolumn1 | 4 | NULL | 4514856 | NULL |
+----+-------------+-------+-------+---------------+------------+---------+------+---------+-------+