MySQL performance with index clairification

Question

Say I have a mysql table with an index on the column 'name':

I do this query:

select * from name_table where name = 'John';

Say there are 5 results that are returned from a table with 100 rows.

Say I now insert 1 million new rows, non that have a name John, so there are still only 5 Johns in the table. Will the select statement be as fast as previously, so will inserting all these rows have an impact on the read speed of an indexed table?

What indexes do you have? – Rick James Jun 09 '15 at 16:34 — Rick James, Jun 09 '15 at 16:34

score 3 · Accepted Answer · edited May 23 '17 at 11:43

Indexes have their own "tables", and when the MySQL engine determines that the lookup references an indexed column, the lookup happens on this table. It isn't really a table per-se, but the gist checks out.

That said, it will be nanoseconds slower, but not something you should concern yourself with.

More importantly, concern youself with indexing pertinent data, and column order, as these have MUCH more of an impact on database performance.

To learn more about what is happening behind the scenes, query the EXPLAIN:

EXPLAIN select * from name_table where name = 'John';

Note: In addition to the column orders listed in the link, it is a good (nay, great) idea to have variable length columns (VARCHAR) after their fixed-length counterparts (CHAR) as, durring the lookup, the engine has to either look at the row, read the column lengths, then skip forward for the lookup (mind you, this is only for non-indexed columns), or read the table declairation and know it always has to look at the column with the offset X. It is more complicated behind the scenes, but if you can shift all fixed-length columns to the front, you will thank yourself. Basically:

Indexed columns.
Everything Fixed-Length in order according to the link.
Everything Variable-Length in order according to the link.

What we should be concerned about is the **cardinality** of the values in the index, and making sure the predicates in the query don't disallow the optimizer from making effective use of an index. To answer OPs question, yes it will be just as fast after he adds a gajillion rows, because only five rows have the value of `name` the database is looking for, and with a tiny number of operations, the database can locate the exact block in the index that contains those entries, no matter how many rows are in the table. — spencer7593, Jun 05 '15 at 23:35
@Mike - I don't think MySQL makes any difference based on order of the columns. Do you have evidence to support your recommendation? — Rick James, Jun 09 '15 at 16:36
@RickJames - I included a relevant link, and Google returned a number more. — Mike, Jun 09 '15 at 17:23

spencer7593 · Answer 2 · 2015-06-05T23:32:02.480

Yes, it will be just as fast.

(In addition to the excellent points made Mike's answer...) there's an important point we should make regarding indexes (B-tree indexes in particular):

The entries in the index are stored "in order".

The index is also organized in a way that allows the database to very quickly identify the blocks in the index that contain the entries it's looking for (or the block that would contain entries, if no matching entries are there.)

What this means is that the database doesn't need to look at every entry in the index. Given a predicate like the one in your question:

  WHERE name = 'John'

with an index with a leading column of name, the database can eliminate vast swaths of blocks that don't need to be checked.

Blocks near the beginning of the index contain entries 'Adrian' thru 'Anna', a little later in the index, a block contains entries for Caleb thru Carl, further long in the index James thru Jane, etc.

Because of the way the index is organized, the database effectively "knows" that the entries we're looking for cannot be in any of those blocks (because the index is in order, there's no way the value John could appear in those blocks we mentioned). So none of those blocks needs to be checked. (The database figures out in just a very small number of operations, that 98% of the blocks in the index can be eliminated from consideration.

High cardinality = good performance

The take away from this is that indexes are most effective on columns that have high cardinality. That is, there are a large number of distinct values in the column, and those values are unique or nearly unique.

This should clear up the answer to the question you were asking. You can add brazilians of rows to the table. If only five of those rows have a value of John in the name column, when you do a query

  WHERE name = `John`

it will be just as fast. The database will be able to locate the entries your looking for nearly as fast as it can when you had a thousand rows in the table.

(As the index grows larger, it does add "levels" to the index, to traverse down to the leaf nodes... so, it gets ever so slightly slower because of a tiny few more operations. Where performance really starts to bog down is when the InnoDB buffer cache is too small, and we have to wait for the (glacially slow in comparison) disk io operations to fetch blocks into memory.

Low cardinality = poor performance

Indexes on columns with low cardinality are much less effective. For example, a column that has two possible values, with an even distribution of values across the rows in the table (about half of the rows have one value, and the other half have the other value.) In this case, the database can't eliminate 98% of the blocks, or 90% of the blocks. The database has to slog through half the blocks in the index, and then (usually) perform a lookup to the pages in the underlying table to get the other values for the row.

But with gazillions of rows with a column gender, with two values 'M' and 'F', an index with gender as a leading column will not be effective in satisfying a query

  WHERE gender = 'M'

... because we're effectively telling the database to retrieve half the rows in the table, and it's likely those rows are going to be evenly distributed in the table. So nearly every page in the table is going to contain at least one row we need, the database is going to opt to do a full table scan (to look at every row in every block in the table) to locate the rows, rather than using an index.

So, in terms of performance for looking up rows in the table using an index... the size of the table isn't really an issue. The real issue is the cardinality of the values in the index, and how many distinct values we're looking for, and how many rows need to be returned.

The levels of the BTree are only slightly important. A million-row table will have about 3 levels; a _trillion_ row table will have about 6. — Rick James, Jun 09 '15 at 16:39
I concur. The number levels in the Btree have almost no impact on performance. The wording of that idea in my answer is awkward: "ever so slightly slower because of a tiny few more operations"... referring to the possible addition of a level or two in the index. — spencer7593, Jun 09 '15 at 22:57

MySQL performance with index clairification

2 Answers2