42

Lets say I have a simple many-to-many table between tables "table1" and "table2" that consists from two int fields: "table1-id" and "table2-id". How should I index this linking table?

I used to just make a composite primary index (table1-id,table2-id), but I read that this index might not work if you change order of the fields in the query. So what's the optimal solution then - make independent indexes for each field without a primary index?

Thanks.

serg
  • 109,619
  • 77
  • 317
  • 330
  • 1
    The optimal indexing for many-to-many is discussed here: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table – Rick James Mar 10 '20 at 16:33

3 Answers3

42

It depends on how you search.

If you search like this:

/* Given a value from table1, find all related values from table2 */
SELECT *
FROM table1 t1
JOIN table_table tt ON (tt.table_1 = t1.id)
JOIN table2 t2 ON (t2.id = tt.table_2)
WHERE t1.id = @id

then you need:

ALTER TABLE table_table ADD CONSTRAINT pk_table1_table2 (table_1, table_2)

In this case, table1 will be leading in NESTED LOOPS and your index will be usable only when table1 is indexed first.

If you search like this:

/* Given a value from table2, find all related values from table1 */
SELECT *
FROM table2 t2
JOIN table_table tt ON (tt.table_2 = t2.id)
JOIN table1 t1 ON (t1.id = tt.table_1)
WHERE t2.id = @id

then you need:

ALTER TABLE table_table ADD CONSTRAINT pk_table1_table2 (table_2, table_1)

for the reasons above.

You don't need independent indices here. A composite index can be used everywhere where a plain index on the first column can be used. If you use independent indices, you won't be able to search efficiently for both values:

/* Check if relationship exists between two given values */
SELECT 1
FROM table_table
WHERE table_1 = @id1
  AND table_2 = @id2

For a query like this, you'll need at least one index on both columns.

It's never bad to have an additional index for the second field:

ALTER TABLE table_table ADD CONSTRAINT pk_table1_table2 PRIMARY KEY (table_1, table_2)
CREATE INDEX ix_table2 ON table_table (table_2)

Primary key will be used for searches on both values and for searches based on value of table_1, additional index will be used for searches based on value of table_2.

Flavio Wuensche
  • 9,460
  • 1
  • 57
  • 54
Quassnoi
  • 413,100
  • 91
  • 616
  • 614
  • Thanks for the detailed answer, but what if I search both ways? Also I am using Hibernate so I am not even sure which way it is using. – serg Feb 20 '09 at 22:02
  • 7
    IF you search both ways, you'll need TWO indices: one composite for the PRIMARY KEY and one plain for the column which is second in the PRIMARY KEY. It's in the bottom of my post. – Quassnoi Feb 20 '09 at 22:05
  • In the two-index example at the end, would there be any advantage to having the non-pk index be a composite of the two columns in reverse rather than just the single column? i.e., `CREATE INDEX ix_table2_table1 ON table_table (table_2, table_1)` rather than `CREATE INDEX ix_table2 ON table_table (table_2)` ? – Collin Barrett Feb 08 '18 at 14:30
  • 2
    To answer my own question, it seems the single column index suggested by @Quassnoi implicitly includes the second column. So, it's just a matter of brevity vs. explicitness. source: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table – Collin Barrett Feb 08 '18 at 14:38
5

As long as you are specifying both keys in the query, it doesn't matter what order they have in the query, nor does it matter what order you specify them in the index.

However, it's not unlikely that you will sometimes have only one or the other of the keys. If you sometimes have id_1 only, then that should be the first (but you still only need one index).

If you sometimes have one, sometimes the other, sometimes both, you'll need one index with both keys, and a second (non-unique) index with one field - the more selective of the two keys - and the primary composite index should start with the other key.

dkretz
  • 37,399
  • 13
  • 80
  • 138
  • 1
    I like your answer the best but do not have the expertise to verify it. – jpierson Jun 28 '10 at 17:33
  • What do you mean by 'more selective'? – André Laszlo Jan 20 '11 at 09:09
  • 2
    A more technical word is "cardinality". It means how many different values there are for the field. At one extreme, high cardinality, every value is unique. On the other hand, some fields might have only a few distinct values, in which case an index doesn't save much in the way of disk reads. – dkretz Jan 21 '11 at 02:04
  • Paragraph 1 is correct only if the `WHERE` clause has both columns being tested with `= constant`. Otherwise, the order of the columns _in the index_ may make a big difference. – Rick James Mar 10 '20 at 16:24
  • Paragraph 3 is partially incorrect. "Selectivity" (aka "cardinality") of individual columns is irrelevant _in a composite (multi-column) index_. Think of a composite key as being the concatenation of the columns; the selectivity of individual columns is no longer relevant. – Rick James Mar 10 '20 at 16:26
0

@Quassnoi, in your first query you're actually using only tt.table_1 key as we can see from the WHERE-clause: WHERE t1.id = @id. And in the second query - only tt.table_2.

So the multi-column index could be useful only in the third query because of WHERE table_1 = @id1 AND table_2 = @id2. If the queries of this kind are not going to be used, do you think it's worth to use two separate one-column indices instead?

RocketR
  • 3,626
  • 2
  • 25
  • 38
  • A separate one-column index is virtually always a waste -- a multi-column ("composite") index _starting_ with that one column is sufficient. – Rick James Mar 10 '20 at 16:30
  • 1
    Often, but not always. Having more columns in an index increases the size of the index and the frequency of its updates. So adding extra columns to an index should be done only when there's queries that benefit from it. A better phrasing would be: remove an index if it constitutes a prefix of another index. – RocketR Jun 05 '20 at 10:53