4

I have 3 tables in MySQL:

innodb_verion     1.1.8
version           5.5.29-log
Table: artist
Columns:
    id              int(11)
    new_id          varchar(50)
Table: collection
Columns:
    id              int(11)
    new_id          varchar(50)
Table: artist_collection
Columns:
    artist_id       int(11) PK
    collection_id   int(11) PK

The artist and collection tables contain 100 rows (I just pulled out a subset to test this problem)

The artist_collection table contains 8 million+ rows and has a PRIMARY KEY created as:

ALTER TABLE artist_collection ADD PRIMARY KEY (artist_id,collection_id);

When I do identical joins for artist and collection, collection takes significantly longer than artist:

SELECT artist.new_id AS 'i' FROM artist INNER JOIN artist_collection ON artist.id=artist_collection.artist_id LIMIT 100;
100 row(s) returned  0.001 sec
SELECT collection.new_id AS 'i' FROM collection INNER JOIN artist_collection ON collection.id=artist_collection.collection_id LIMIT 100;
100 row(s) returned  2.636 sec

This is a big deal because these original tables are 10 million+ rows and I don't want to be using LIMIT in my queries. Even when I did LIMIT 1000, the slow query would take ~35 seconds to complete, whereas the fast one was ~0.020 seconds. I then did an EXPLAIN EXTENDED on both queries to see what was really happening:

EXPLAIN EXTENDED SELECT artist.id AS 'i' FROM artist INNER JOIN artist_collection ON artist.id=artist_collection.artist_id LIMIT 100;

id    select_type    table              type    possible_keys    key      key_len    ref              rows    filtered    Extra
1     SIMPLE         artist             ALL     NULL             NULL     NULL       NULL             82      100.00
1     SIMPLE         artist_collection  ref     PRIMARY          PRIMARY  4          tbl.artist.id    2       100.00      Using index
EXPLAIN EXTENDED SELECT collection.id AS 'i' FROM collection INNER JOIN artist_collection ON collection.id=artist_collection.collection_id LIMIT 100;

id    select_type    table              type    possible_keys    key      key_len    ref      rows     filtered    Extra
1     SIMPLE         collection         ALL     NULL             NULL     NULL       NULL     101      100.00
1     SIMPLE         artist_collection  index   NULL             PRIMARY  8          NULL     8725465  100.00      Using where; Using index; Using join buffer

As you can see, it is not treating the queries equally. I then ran:

ALTER TABLE artist_collection DROP PRIMARY KEY;
ALTER TABLE artist_collection ADD PRIMARY KEY (collection_id,artist_id);

This caused the problem to reverse itself, where now the artist JOIN was taking very long. So my questions are this:

1) Does MySQL interpret the first and second entries of a PRIMARY KEY differently?
2) What exactly is happening?
3) How do I make the queries equal (fast)?

andeo2L
  • 111
  • 1
  • 2

1 Answers1

1

Think about the records in that compound index as stored this way:

+---------------+-----------+
| collection_id | artist_id |
+---------------+-----------+
|             1 |         1 |
|             1 |         2 |
|             1 |         3 |
|             2 |         1 |
|             2 |         2 |
|             3 |         1 |
+---------------+-----------+

Given the fact that data is first sorted by collection_id looking for all collection_id = 1 is very simple. However, looking for all artist_id is not.

Even if you search for both collection_id = 1 and artist_id = 1 then it is also an easy task (you don't need to scan all the artist_id values).

A similar thing is happening to MySQL. When you join only by collection_id this index is very useful. When you join by both collection_id and artist_id this index is again useful. However, when joining only on artist_id this index won't be of much help.

If you are going to also join only on artist_id then you might want to add another index on artist_id. You should see in your exaplain that that new index will be used in those queries.

Mosty Mostacho
  • 42,742
  • 16
  • 96
  • 123