2

I have a table with four columns

|-----|-----|-----|-----------|
|  a  |  b  |  c  |     d     |
| int | int | int | timestamp |
|-----|-----|-----|-----------|

This table contains more than 100 000 000 records. I have indices on all four columns and one compound index on (a,b,c).

If I run the following query, it works fine (few milliseconds):

SELECT 
  count(*) FROM my_table 
WHERE 
  a = X AND b = Y AND c = Z

It basically returns about 3 thousand elements.

However if I want to add a condition on column d (which is a timestamp):

SELECT 
  count(*) FROM my_table 
WHERE 
  a = X AND b = Y AND c = Z AND d < '2018-01-01T00:00:00'

Then the query response time jumps to minutes.

What am I missing here ?

Shadow
  • 33,525
  • 10
  • 51
  • 64
benjamin.d
  • 2,801
  • 3
  • 23
  • 35
  • You need to add `EXPLAIN` before the query and then you'll know what's wrong and if that condition force a full table scan due to indexing problem (read about `explain` [here](https://www.sitepoint.com/using-explain-to-write-better-mysql-queries/)) – Alon Eitan Jan 30 '18 at 17:23
  • The explain plan did not help much. There was no full scan or anything special – benjamin.d Jan 30 '18 at 22:08

1 Answers1

2

Since you have a compound index on (a,b,c), the first query only needs to use the index (see concept of covering indexes), therefore the results can be served very quickly. The server does not even have to open the table itself.

When you add the criteria on column d, mariadb cannot use the compound index any more as a covering index. The index will still be used to speed up the query to get the records matching the first 3 criteria, but then mariadb has to go to the big table and further filter column d without using any index to get the matching records fort the 4th criterion. Depending on how selective your compound index is, this still can take a lot of time.

You can experiment with creating an index on all 4 columns, but the overall price may be greater than the gain.

Shadow
  • 33,525
  • 10
  • 51
  • 64
  • Thanks for your answer. I understand what you’re saying but the query on a,b,c returns only 3 thousand elements. I thought the dB will just try to filter on d on these 3000 records, which shouldn’t take long. Am i wrong ? – benjamin.d Jan 30 '18 at 19:39
  • Yeah, but to do that mariadb has to open the big table, locate all 3 thousand records (they can be anywhere) and filter them further. – Shadow Jan 30 '18 at 19:56
  • Ok i get it. My only option is to add a fourth column to my index then ? Using a PK can have any beneficial effect ? – benjamin.d Jan 30 '18 at 21:54
  • You have lot more options than just adding column d to the index, but hypotetically discussing all options to speed up queries is beyond an SO question. There are entire books written on this topic. – Shadow Jan 30 '18 at 22:02
  • And for my scenario what option would you suggest me to look into ? – benjamin.d Jan 30 '18 at 22:06
  • I do not know enough about your requirements, nor about your mysql server to responsibly suggest anything. You may want to look into caching - with such large tables it is common practice to take some load off from the database server. – Shadow Jan 30 '18 at 22:18
  • Fair enough. Thanks – benjamin.d Jan 30 '18 at 22:19
  • `d` _must_ be after `a,b,c` in the new compound index. This is because `d < ...` is a "range" test, whereas the others are simple `=` tests. `a,b,c` can be in any order. `INDEX(a,b,c,d)` can be done optimally, and done entirely in the index. And it is unlikely that anything else will slow down due to it. And you may as well toss the existing `INDEX(a,b,c)` as being redundant then. – Rick James Feb 10 '18 at 23:44
  • All of the notes in my previous Comment are covered [_here_](http://mysql.rjweb.org/doc.php/index_cookbook_mysql), but from a different point of view. – Rick James Feb 10 '18 at 23:45