1

In the following two scenarios:

create table `Table` (
    `id` int(10),
    `column1` int(10),
    `column2` int(10),
    KEY (`column2`)
);

and

create table `Table` (
    `id` int(10),
    `column1` int(10),
    `column2` int(10),
    KEY (`column1`, `column2`)
);

Now consider the query select * from Table where column2=xxx;

Is there any possibility that the second scenario will be faster than the first scenario, for example in the case where the rows just happen to cluster densely on column1?

Or can we say with 100% certainty that the first scenario is always at least as fast as the second scenario?

I tried searching composite/compound key speed but cannot find answer with 100% certainty when compared to single key.

cr001
  • 655
  • 4
  • 16

3 Answers3

1

Read here: https://dev.mysql.com/doc/refman/8.0/en/multiple-column-indexes.html

Your first index will be faster since the query will be an index scan. Your composite index will actually cause the query to be a slow row scan. The index will only be used when the column you are looking for is leftmost in the index. Since column2 is not leftmost, your index won't be used.

The composite index would only be used for queries like:

  • select * from Table where column1='X'
  • select * from Table where column1='X' and column2='Y'

The big thing with indexes is knowing how you'll be querying the data. If you don't know, prematurely optimizing may not do you any good.

Erik Nedwidek
  • 6,134
  • 1
  • 25
  • 25
1

Is there any possibility that the second scenario will be faster than the first scenario

Yes.

This is a scenario when the table statistic is so incorrect that server errorneously uses the index instead of table scan. For example, the statistic shows that approximately 1% of rows contains the value xxx whereas really this is 50%.

Of course the probability of such situation is extremely low, but it is not zero nevertheless.

ANALYZE TABLE will fix this issue.

Akina
  • 39,301
  • 5
  • 14
  • 25
1

Literal answer: 100% is not guaranteed; also "single-column" versus "composite" is not the right question, as Erik points out.

Practical answer: Given where column2=xxx, you should have an index starting with column2.

Long answer:

The database engine applies a lot intelligence toward "caching" indexes and data. The goal is to make the typical query fast on average.

If there is not much "churn", blocks of data and indexes are brought into RAM (the "buffer_pool") and they live there. The first query after a restart will have to do some I/O to fetch blocks from disk; this is slow. Subsequent queries can skip some or all of the fetches; hence they are faster. That is, other queries can happen to make _this query faster. Hence the 100% is defeated. (And there is no way to achieve "100%" without slowing everything down to a constant, but inefficient, speed.)

More on creating optimal indexes: http://mysql.rjweb.org/doc.php/index_cookbook_mysql

Let me confuse things further. In your specific 3-column table, and with your specific SELECT *, here is the usual pecking order; slowest to fastest

INDEX(column2, column1) -- index's BTree is bulkier than the next
INDEX(column2)          -- good (and recommended)
INDEX(column2, column1, id) -- "covering" all of "*" is in the index

When building indexes, it is best to collect all the important queries (Select, Update, Delete) together to decide which set of indexes to have. This Question has only addressed one specific select for that specific schema.

Rick James
  • 135,179
  • 13
  • 127
  • 222