12

I've researched this, but I still cannot explain why:

SELECT cl.`cl_boolean`, l.`l_name`
FROM `card_legality` cl
INNER JOIN `legality` l ON l.`legality_id` = cl.`legality_id`
WHERE cl.`card_id` = 23155

Is significantly slower than:

SELECT cl.`cl_boolean`, l.`l_name`
FROM `card_legality` cl
LEFT JOIN `legality` l ON l.`legality_id` = cl.`legality_id`
WHERE cl.`card_id` = 23155

115ms Vs 478ms. They are both using InnoDB and there are relationships defined. The 'card_legality' contains approx 200k rows, while the 'legality' table contains 11 rows. Here is the structure for each:

CREATE TABLE `card_legality` (
  `card_id` varchar(8) NOT NULL DEFAULT '',
  `legality_id` int(3) NOT NULL,
  `cl_boolean` tinyint(1) NOT NULL,
  PRIMARY KEY (`card_id`,`legality_id`),
  KEY `legality_id` (`legality_id`),
  CONSTRAINT `card_legality_ibfk_2` FOREIGN KEY (`legality_id`) REFERENCES `legality` (`legality_id`),
  CONSTRAINT `card_legality_ibfk_1` FOREIGN KEY (`card_id`) REFERENCES `card` (`card_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

And:

CREATE TABLE `legality` (
  `legality_id` int(3) NOT NULL AUTO_INCREMENT,
  `l_name` varchar(16) NOT NULL DEFAULT '',
  PRIMARY KEY (`legality_id`)
) ENGINE=InnoDB AUTO_INCREMENT=12 DEFAULT CHARSET=latin1;

I could simply use LEFT-JOIN, but it doesn't seem quite right... any thoughts, please?

UPDATE: As requested, I've included the results of explain for each. I had run it previously, but I dont pretend to have a thorough understanding of it..

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  cl  ALL PRIMARY NULL    NULL    NULL    199747  Using where
1   SIMPLE  l   eq_ref  PRIMARY PRIMARY 4   hexproof.co.uk.cl.legality_id   1   

AND, inner join:

id  select_type table   type    possible_keys   key key_len         ref                         rows    Extra
1   SIMPLE  l   ALL PRIMARY NULL    NULL    NULL    11  
1   SIMPLE  cl  ref PRIMARY,legality_id legality_id 4   hexproof.co.uk.l.legality_id    33799   Using where
animuson
  • 53,861
  • 28
  • 137
  • 147
Ben
  • 251
  • 1
  • 5
  • 13
  • By the way `card_id` is a VARCHAR as I have no choice, I wouldn't normally accept that. – Ben Nov 21 '11 at 20:23

4 Answers4

10

It is because of the varchar on card_id. MySQL can't use the index on card_id as card_id as described here mysql type conversion. The important part is

For comparisons of a string column with a number, MySQL cannot use an index on the column to look up the value quickly. If str_col is an indexed string column, the index cannot be used when performing the lookup in the following statement:

SELECT * FROM tbl_name WHERE str_col=1;

The reason for this is that there are many different strings that may convert to the value 1, such as '1', ' 1', or '1a'.

If you change your queries to

SELECT cl.`cl_boolean`, l.`l_name`
FROM `card_legality` cl
INNER JOIN `legality` l ON l.`legality_id` = cl.`legality_id`
WHERE cl.`card_id` = '23155'

and

SELECT cl.`cl_boolean`, l.`l_name`
FROM `card_legality` cl
LEFT JOIN `legality` l ON l.`legality_id` = cl.`legality_id`
WHERE cl.`card_id` = '23155'

You should see a huge improvement in speed and also see a different EXPLAIN.

Here is a similar (but easier) test to show this:

> desc id_test;
+-------+------------+------+-----+---------+-------+
| Field | Type       | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| id    | varchar(8) | NO   | PRI | NULL    |       |
+-------+------------+------+-----+---------+-------+
1 row in set (0.17 sec)

> select * from id_test;
+----+
| id |
+----+
| 1  |
| 2  |
| 3  |
| 4  |
| 5  |
| 6  |
| 7  |
| 8  |
| 9  |
+----+
9 rows in set (0.00 sec)

> explain select * from id_test where id = 1;
+----+-------------+---------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table   | type  | possible_keys | key     | key_len | ref  | rows | Extra                    |
+----+-------------+---------+-------+---------------+---------+---------+------+------+--------------------------+
|  1 | SIMPLE      | id_test | index | PRIMARY       | PRIMARY | 10      | NULL |    9 | Using where; Using index |
+----+-------------+---------+-------+---------------+---------+---------+------+------+--------------------------+
1 row in set (0.00 sec)


> explain select * from id_test where id = '1';
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table   | type  | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | id_test | const | PRIMARY       | PRIMARY | 10      | const |    1 | Using index |
+----+-------------+---------+-------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.00 sec)

In the first case there is Using where; Using index and the second is Using index. Also ref is either NULL or CONST. Needless to say, the second one is better.

Andreas Wederbrand
  • 38,065
  • 11
  • 68
  • 78
  • aha! Brilliant! You are correct, searching on the 'card_id' as an INT, when in fact it was a VARCHAR made these queries nearly 600x slower than they should have been! Thank you Andreas :) – Ben Nov 21 '11 at 20:50
  • I've added a link where this is explained. It's good reading. – Andreas Wederbrand Nov 21 '11 at 21:01
4

L2G has it pretty much summed up, although I suspect it could be because of the varchar type used for card_id.

I actually printed out this informative page for benchmarking / profiling quickies. Here is a quick poor-mans profiling technique:

Time a SQL on MySQL
Enable Profiling
mysql> SET PROFILING = 1
...
RUN your SQLs
...
mysql> SHOW PROFILES;

+----------+------------+-----------------------+
| Query_ID | Duration   | Query                 |
+----------+------------+-----------------------+
|        1 | 0.00014600 | SELECT DATABASE()     |
|        2 | 0.00024250 | select user from user |
+----------+------------+-----------------------+
mysql> SHOW PROFILE for QUERY 2;

+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000034 |
| checking query cache for query | 0.000033 |
| checking permissions           | 0.000006 |
| Opening tables                 | 0.000011 |
| init                           | 0.000013 |
| optimizing                     | 0.000004 |
| executing                      | 0.000011 |
| end                            | 0.000004 |
| query end                      | 0.000002 |
| freeing items                  | 0.000026 |
| logging slow query             | 0.000002 |
| cleaning up                    | 0.000003 |
+--------------------------------+----------+

Good-luck, oh and please post your findings!

Stephane Gosselin
  • 9,030
  • 5
  • 42
  • 65
  • This is really useful information stefgosselin, thank you. Your prognosis was actually correct, but the explanation by Andreas explained it. Thanks for your help :) – Ben Nov 21 '11 at 20:52
3

I'd try EXPLAIN on both of those queries. Just prefix each SELECT with EXPLAIN and run them. It gives really useful info on how mySQL is optimizing and executing queries.

Laurel
  • 5,965
  • 14
  • 31
  • 57
L2G
  • 846
  • 6
  • 28
  • 3
    Based on the topic at hand I'd say there's a pretty big chance the OP is aware of how to use "EXPLAIN". Either way this kind of info should be put in a comment and not in an answer as you are not attempting to answer his question. – Naatan Nov 21 '11 at 20:31
  • Hi L2G, thanks for your comment. I'll persevere with the **EXPLAIN** function and see what I can find. The mysql manual is somewhat lacking (or it is to me at least). Many Thanks – Ben Nov 21 '11 at 20:48
0

I'm pretty sure that MySql has better optimization for Left Joins - no evidence to back this up at the moment.

ETA : A quick scout round and I can't find anything concrete to uphold my view so.....

K. Bob
  • 2,668
  • 1
  • 18
  • 16