27

In a typical many-many arrangement like this...

Movies       Actors       Movies_Actors
------       ------       -------------
movie_ID     actor_ID     FK_movie_ID
title        name         FK_actor_ID

... how should the association table ('Movies_Actors') be indexed for optimal read speed?

I usually see this done only with the composite primary key in the association table, like so:

CREATE TABLE Movies_Actors (
  FK_movie_ID INTEGER,
  FK_actor_ID INTEGER,
  PRIMARY KEY (FK_movie_ID, FK_actor_ID)
)

However, this seems like the index will only be useful when searching for both movie_ID and actor_ID (although I'm not certain on whether a composite index also works for the individual columns).

Since both "what actors are in Movie X" and "what movies has actor Y been in" will be the common queries for this table, it seems like there should be an individual index on each column to quickly locate actors and movies on their own. Does a composite index effectively do this? If not, having a composite index seems pointless on this table. And if a composite index is pointless, what to do about a primary key? The candidate key is clearly the composite of the two columns, but if the resulting composite index is pointless (it mustn't be?) it seems like a waste.

Also, this link adds some confusion and indicates that it might even be useful to actually specify two composite indices... one of them as (FK_movie_ID, FK_actor_ID), and the other in reverse as (FK_actor_ID, FK_movie_ID), with the choice of which is the primary key (and thus usually clustered) and which is 'just' a unique composite index being based on which direction is queried more.

What is the real story? Does a composite index automatically effectively index each column for searching on one or the other? Should the optimal (in read speed, not size) association table have a composite index in each direction and one on each column? What are the behind-the-scene mechancs?


EDIT: I found this related question that for some reason I didn't locate before posting... How to properly index a linking table for many-to-many connection in MySQL?

Community
  • 1
  • 1
Russ
  • 10,835
  • 12
  • 42
  • 57

2 Answers2

17

(although I'm not certain on whether a composite index also works for the individual columns).

Yes, it can. But only the prefix: http://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys

Also, this link adds some confusion and indicates that it might even be useful to actually specify two composite indices... one of them as (FK_movie_ID, FK_actor_ID), and the other in reverse as (FK_actor_ID, FK_movie_ID),

That's actually the thing to do.

Take one as clustering index, and the other as non-clustering index that will anyways include the clustering index key--hence no need to include the that column again (thx to JNK).

CREATE CLUSTERED INDEX a on Movies_Actors (fk_movie_id, fk_actor_id);
CREATE NONCLUSTERED INDEX b on Movies_Actors (fk_actor_id);

What is the real story?

http://Use-The-Index-Luke.com/ :)

Does a composite index automatically effectively index each column for searching on one or the other?

No. Only the prefix of the index. If you have an index (a,b,c), the query a=? and b=? can use the index. However c=? can't, nor can b=? and c=?.

Should the optimal (in read speed, not size) association table have a composite index in each direction and one on each column?

If you need to join in both directions, yes ("composite index in each direction") and no ("one on each column").

What are the behind-the-scene mechanics?

Well, same link again.

Speaking SQL Server, you might eventually also consider an indexed view. That's kind of pre-joining. Two indexes, as above, might also be fast enough.

Markus Winand
  • 8,371
  • 1
  • 35
  • 44
  • Good answer, thanks. One thing: you indicated that two composite indices with one the reverse of the other is "the thing to do", but then later said that this should be done *and* have individual indices on each column "if you need to join in both directions". Which is it? If the first column in the index can be used as if the column were indexed alone, isn't adding single indexes on each column a waste of time? – Russ Jan 17 '11 at 15:49
  • Also - interesting link, thanks! Looks like there is some good info there. The Q&A forum implementation seems vaguely familiar... – Russ Jan 17 '11 at 15:51
  • @Russ - vaguely, there are just the people missing. I have edited the response above, seems that I missed that "and one on each column" part. – Markus Winand Jan 17 '11 at 15:57
  • 2
    @Russ, Markus - Putting a covering index on both columns, with reversed order, is a waste of space. Having an index on `A,B` and one on `B` is equivalent to having one on `A,B` and one on `B,A`, except you don't need the extra index space and corresponding updates/inserts if you only index one column in the second index. If they select both columns, it will use the covering index no matter the order. – JNK Jan 17 '11 at 16:05
  • @JNK - yes and no. First the no: select B where A=? is best served with covering (A, B) and vice versa. The case selecting select * where A=? and B=? is not the issue in this sample, i'd say. second part: yes, in that sample the INCLUDE clause should not be used, because if there is a covering index, the key of the covering index is anyways included in the non-clustered index, hence it automatically covers the other column. – Markus Winand Jan 17 '11 at 16:11
2

In SQL Server, a composite index can be used for a single field search for the first column only. That means you should have an additional, one field index on FK_actor_id if there will be searches on that field without FK_Movie_id in the same query.

JNK
  • 63,321
  • 15
  • 122
  • 138