0

I have a single table with four columns...

 `id` INT(11) NOT NULL AUTO_INCREMENT   
 `tid` INT(11) NOT NULL
 `cid` INT(11) NOT NULL
 `name` NVARCHAR(4096) NULL DEFAULT NULL

id is the unique primary key. The other columns are not unique.

I want to return the list of all id values that have specific tid and cid values and are sorted by the name. So this...

 select id
 from myTable
 where cid = 1 && tid = 1
 order by name

There are about 125k records in the table and there should be around 50k that happen to match this criteria. All four columns have individual indexes.

On my machine the query takes around 140ms to run. I need to get this down to around 20ms or better. I thought the solution was to add a new covering index that is defined against cid, tid and name, in that order. Did not make any difference though.

Any ideas? Is my covering index incorrectly setup?

Phil Wright
  • 22,580
  • 14
  • 83
  • 137
  • 4
    Your column `name` is horribly misnamed, given its enormous size. Sorting by a 4k column makes no sense and, yes, will be painfully slow. – elixenide Oct 19 '15 at 04:20
  • But I thought that the covering index would have effectively already done the sorting and so the hit is taken on the insert and update. Surely the select should be fast because it looks checks the index only? – Phil Wright Oct 19 '15 at 04:23
  • depends on what you need. Enormous it is.. why though? If you have a good reason, seconds should not be an issue here. – Mugé Oct 19 '15 at 04:24
  • Sorting by a huge column will always be slow. Have you tried running an `EXPLAIN` on this query to see what indexes it's using? Also, keep in mind that 50k rows at 4K per row is 200 MB. That is a *huge* amount of data to process in 20ms. You need to rethink what you're trying to do here. – elixenide Oct 19 '15 at 04:28
  • 1
    An index usually helps make searches more efficient. It *may* be used to optimize sorting [*in some cases*](https://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html), but it's not guaranteed to work that way. – code_dredd Oct 19 '15 at 04:31
  • I just make the name column 128 chars long and now it runs super fast. But at 256 chars it runs slow again. Seems to be some cut-off in the engine between those values that has an impact. – Phil Wright Oct 19 '15 at 04:33
  • @mugé I'm saying I'm not familiar with anything that can be called a "name" that is 4K characters long (or even 2k or 1k, if we're talking about multi-byte string representations). It seems likely that the column stores something other than names or is simply much bigger than it needs to be. – elixenide Oct 19 '15 at 04:41
  • @PhilWright do you actually need that column to store 4K-character values? And do you actually need to sort by those values? – elixenide Oct 19 '15 at 04:42
  • first off there is no info on the index creation, just sorta like your description. show it. The db engines don't hop from index to index during query plan execution. They pick one. It sounds like you need a composite index, not individual indexes most of which won't be used – Drew Oct 19 '15 at 04:44
  • @EdCottrell. apologies. I must have blanked out the enormous size of nvarchar above 4K, which can only go up to (4000) BTW. 'Name' is wrong as it is not descriptive to some string. – Mugé Oct 19 '15 at 11:22
  • What does not make sense to me is running a query and organizing it by such long string unless there is some unified content. Outside of its logic, I will assume that running a string will take up longer time, and I will also speculate the end result might be unreliable. If the content of such long string is of such importance requiring some sorting, then either of the IDs (tid, cid) may contain some short descriptives, or an additional sorting type might be needed to define the data under which category it goes. I hope I make sense. – Mugé Oct 19 '15 at 11:37

2 Answers2

2

I think there're some problems with the query and the table definition itself.

  • Table.name is a 4K char column
  • The query is sorting by that column

You're sorting based on a column in which you're storing strings. In order to sort by strings, string comparisons have to be performed. String comparison tends to be a slow operation and, given the size of the column you're using, it's very likely to cause a noticeable performance hit.

We don't have an indication of the contents of your name column and it seems difficult to think of an actual name that would require that many characters.

If this string has several pieces of data that are conceptually different, perhaps the column should be broken down into multiple separate columns, if possible, and then normalized as appropriate.

If you can break the contents of that column into multiple smaller ones and then use those, the string comparisons, although still expensive, would be 'faster' simply because the strings being compared will be significantly shorter than what they're now.

Another thing to consider is if you can optimize the search by avoiding string comparisons altogether or by avoiding queries that will cause a full table scan despite the fact that you've defined indices.

For that you should look at using explain with your query, so that you can get a better understanding of the Query Execution Plan

Quoting the docs (my emphasis):

Depending on the details of your tables, columns, indexes, and the conditions in your WHERE clause, the MySQL optimizer considers many techniques to efficiently perform the lookups involved in an SQL query. ... Your goals are ... to learn the SQL syntax and indexing techniques to improve the plan if you see some inefficient operations.


Edit 1

You've clarified that your name column is actually for user notes. In this case, I think you should consider the following (in addition to what has been mentioned already):

  1. Rename the column to something that correlates to its actual contents
  2. Remove the index from the column
  3. Do not use that column for searching, sorting, or any other operation other than just selecting it to display it (It'd be very rare if it needed to be used for anything else, IMHO.)
  4. Optionally, consider changing the column into a text type and you won't have to worry that much about user essays getting truncated without warning (unless the GUI has enforced the same input length limit to the user)
code_dredd
  • 5,915
  • 1
  • 25
  • 53
  • 1
    Well said. This captures the issues perfectly. – elixenide Oct 19 '15 at 04:44
  • The 'name' actually contains users notes. But your right that the size can be much smaller. 99% of the time people enter no more than maybe 100 characters and so forcing it smaller would be acceptable. I just tried an 'explain' on the query. With a size of 250 it uses an index and is super fast. Leaving it at 4096 characters caused it to table scan and take a relatively long time. Seems the size affects the index used. – Phil Wright Oct 19 '15 at 04:48
  • dare I ask why you have an index on user notes ? Is the following string something that deserves an index: "I want a kite for christmas" – Drew Oct 19 '15 at 04:51
  • @PhilWright: I'd argue that an index for user notes is unnecessary because you shouldn't search for them that way. Search for user notes based on the `id` of the note associated with the `id` of the user. Imagine if a book had the contents of an entire chapter in the index, instead of just the title, and how inefficient that would really be. But that's what you're doing in a sense. – code_dredd Oct 19 '15 at 04:53
0
INDEX(cid, tid, name)

will speed up the query considerably.

However, that assumes name is a civilized length, such as under 255. If you must have a longer name, then this is the best you can do:

INDEX(cid, tid)  -- (in either order)

No, a "prefix" index will not help: INDEX(cid, tid, name(99)). Prefix indexes are useless for ORDER BY.

All four columns have individual indexes.

Individual indexes are not the same as composite indexes. Sometimes they are better; usually they are not.

I provide more details in my Index Cookbook.

Rick James
  • 135,179
  • 13
  • 127
  • 222