3

So I have a table, possibly millions of rows long,

user   | points
---------------
user1  | 10
user2  | 12
user3  | 7
...

and want to SELECT * FROM mytable ORDER BY points LIMIT 100, 1000 Now that works fine, but is horribly slow (on huge tables), since it refuses to use any kind of index, but performs a full table scan. How can I make this more efficient?

My first (obvious) idea was to use an index on points DESC, but then I figured out that MySQL does not support those at all.

Next, I tried to reverse the sign on points, meaning essentially having an ascending index on -points, this didnt help either, since it doesnt use the index for sorting

Lastly, I tried using force index, this yielded barely any performance improvement, since it still fetches the entire table, yet doesnt sort (using filesort: false in EXPLAIN)

I am sure this must be a solved problem, but I did not find any helpful information online. Any hints would be greatly appreciated.

CBenni
  • 555
  • 1
  • 7
  • 20
  • If you've "millions of rows" in your table, you might well just have too large a database for the hardware you're running on. However, the structure itself could have a lot to do with it. Is your database simply user/points? As in, the primary key is a varchar? Or? – Eoghan Jan 24 '17 at 01:09
  • 2
    How is this 'top n'? – Strawberry Jan 24 '17 at 01:09
  • @Strawberry I don't think MySql supports `top n`, does it? – Jorge Campos Jan 24 '17 at 01:13
  • 1
    Top 10 = `ORDER BY points DESC LIMIT 10` – RiggsFolly Jan 24 '17 at 01:15
  • Read this, especially the part about performance questions. http://meta.stackoverflow.com/questions/271055/tips-for-asking-a-good-structured-query-language-sql-question/271056#271056 – O. Jones Jan 24 '17 at 01:22
  • Is this a kind of pagination query? If it is, I recommend you to not rely solely on the `LIMIT` clause. Use other filters to it, like last id fetched and pass it on so instead of `FROM mytable ORDER BY points LIMIT 100, 1000` you would use `FROM mytable where someId > lastSeen ORDER BY points LIMIT 100` I think that alone would do a much better job on the execution plan usage. – Jorge Campos Jan 24 '17 at 01:39
  • @Eoghan I doubt the database is too big for the hardware, if it used the indexes, this would not be an issue – CBenni Jan 24 '17 at 01:50
  • @JorgeCampos Yes, it is; Problem is that the ID in no manner corresponds to the position in the leaderboard. If anything, I could do something like "WHERE points <= previousLeastPoints", which could potentially lead to duplicate rows being returned, especially if the data changes between two calls. Perhaps a cursor-based system would perform better, but if the initial call takes 30+ seconds, theres no point anyways. – CBenni Jan 24 '17 at 01:53
  • Just add the index on `points`. It doesn't have to be descending. – user207421 Jan 24 '17 at 02:42
  • @CBenni I agree, that's why I'm asking what the structure of the table actually is. However, it could also be running on something like a microinstance, which would add to the issues. – Eoghan Jan 24 '17 at 03:26
  • @JorgeCampos MySQL supports the concept, if not the keyword. But this isn't a top n problem. – Strawberry Jan 24 '17 at 08:02
  • @Strawberry Yeah, thanks. I know the concepts I was just in doubt of the keyword which in fact wont. Thanks again. – Jorge Campos Jan 24 '17 at 11:28

1 Answers1

2

Some ways to get better performance from a query.

Never never use SELECT *. It's a rookie mistake. It basically tells the query planner it needs to give you everything. Always enumerate the columns you want in the result set. This is the query you want (assuming you haven't oversimplified your question).

 SELECT user, points
   FROM table
  ORDER BY points
  LIMIT 100,1000

Use a compound index. In the case of your query, a compound index on (points, user) will allow the use of a partial index scan to satisfy your query. That should be faster than a full table sort. MySQL can scan indexes backward or forward, so you don't need to worry about descending order

To add the correct index use a command like this.

ALTER TABLE table ADD INDEX points_user (points, user);

Edit. The suggestion against using SELECT * here is based on (1) my unconfirmed suspicion that the table in question is oversimplified and has other columns in real life, and (2) the inconvenient reality that sometimes the index has to match the query precisely to get best performance results.

I stand by my opinion, based on experience, that using SELECT * in queries with performance sensitivity is not good engineering practice (unless you like the query so much you want to come back to it again and again).

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • 2
    "It basically tells the query planner it needs to give you everything" --- *everything* what? A planner surely knows there are 2 columns, so your query is identical in terms of how query optimiser sees it. – zerkms Jan 24 '17 at 01:26
  • @zerkms That was my train of thought too, but apparently it does make a difference. Regardless, using a compound index seems to have fixed the issue for me! – CBenni Jan 24 '17 at 01:55
  • @CBenni it's a mysql covering index that helps isn't it? I cannot believe `SELECT *` vs `SELECT user, points` would change anything in this particular case. – zerkms Jan 24 '17 at 02:11
  • There are lots of discussion on that matter (`select *` vs `select columns`), one that I like is [this one](http://stackoverflow.com/questions/25093187/is-it-bad-for-performance-to-select-all-columns) – Jorge Campos Jan 24 '17 at 03:15
  • @JorgeCampos and it has nothing about `select *` vs selecting the very same columns via explicit enumeration. Those only mention selecting less columns, which is intuitively more efficient (less stuff to transfer) – zerkms Jan 24 '17 at 06:56
  • @zerkms "fewer". I think the point is, it's good practice to name the columns you want - even in the rare instances where you want all the columns. – Strawberry Jan 24 '17 at 07:57
  • @Strawberry every "good practice" should have rationale behind it, have some technical justification and be applied in a context. Just setting up dogmas and following them blindly is not a technical way, especially when one talks about performance. Otherwise we would get millions of people that think that `SELECT *` is slow unconditionally. For the same reason thousands of people around think `COUNT(*)` is slow as well. – zerkms Jan 24 '17 at 08:22