Simple SELECT query is slow with large table

Question

I have a table with the following structure

SHOW CREATE TABLE data_temperature;

CREATE TABLE `data_temperature` (
  `temperature_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `data_id` bigint(20) unsigned NOT NULL,
  `x_id` varchar(32) DEFAULT NULL,
  `x_sn` varchar(16) DEFAULT NULL,
  `x_unit` char(1) DEFAULT NULL,
  `x_value` decimal(6,2) DEFAULT NULL,
  PRIMARY KEY (`temperature_id`),
  KEY `created` (`created`),
  KEY `data_id` (`data_id`),
  KEY `x_value` (`x_value`)
) ENGINE=InnoDB AUTO_INCREMENT=6274618 DEFAULT CHARSET=latin1

I have a somewhat basic query pulling data from here that's really slow. So I've broken the query down into simpler terms and found that this very simple query is slow (17.52 seconds):

SELECT data_temperature.x_value FROM data_temperature WHERE data_temperature.created BETWEEN '2015-02-02 18:28:42' AND '2015-03-04 18:28:42';

The table has 6,274,617 rows. In fact, SELECT COUNT(*) FROM data_temperature takes 3.66 seconds as well.

The system that this query is running on is my dev system, which is a quad-core running Ubuntu 14.04, 4GB of RAM, and a solid-state drive.

Is this about how long it should take to run a query like this, or am I doing something wrong? Is there a more efficient way to return the data?

Is the datetime field indexed? Your search on that criteria will be greatly helped by that — Hanky Panky, Mar 05 '15 at 05:41
@Hanky웃Panky I thought it was until you asked. I think the `CREATE TABLE` in my question shows that it is, but if you're asking, maybe I'm reading it incorrectly. — Nick Coons, Mar 05 '15 at 05:43
run an EXPLAIN before your query to see if your index is even being picked up? — Hanky Panky, Mar 05 '15 at 05:47
It's not, though I suspect this is because I'm returning 1848520 rows, or 1848520/6274617 =~ 29% of the table, so the optimizer skips the index. If I shorten the range to return fewer rows (say, a week's worth instead of a month's worth), then `EXPLAIN` shows that it does use the index, but it still runs much slower than I would expect (1.47 seconds) for a simple `SELECT` like this. — Nick Coons, Mar 05 '15 at 05:51

score 1 · Accepted Answer · answered Mar 05 '15 at 06:15

Think about how many rows the output has. Think about how much disk space those rows took. Think about how slow disks run. Think about what you will do with all those rows. 17 seconds is reasonable.

Similarly the COUNT(*) took 3.66 seconds because of most of those factors.

Let's dig deeper.

SELECT COUNT(*) FROM tbl on an InnoDB table will completely scan one of the indexes, counting the rows as it goes. The index might be about 100MB. It had to fetch all 100MB from disk, except that maybe all that was already cached. And it had to look at each of the 6M rows, counting as it goes. Total: 3.66 seconds.

Now let's look at the other query. It is more complex, hence slower.

You have a good index for that WHERE clause: INDEX(created). That is in a BTree. First it finds the first entry at or after '2015-02-02 18:28:42'. Then is linearly scans forward until it gets to '2015-03-04 18:28:42'. This probably takes less than 3.66 seconds. But...

For each item in the index, it needs to look for the value. It gets this by first finding the PRIMARY KEY temperature_id, which is in that same BTree, right next to created. Using the temperature_id, it traverses another BTree, namely the one with the PK and the data, to find the corresponding row. There it finds value. This repeats for very created in the range. The blocks needed for these lookups will be many more MBs of data.

It is possible to make that SELECT run faster, but it only helps this one query: INDEX(created, value). This is a "covering" index. That means that all the columns needed for the SELECT are found in the index. Hence, it does not need to reach into the other BTree. This would probably lead to a time of 3.66s or less.

Thank for the detailed explanation. I have a very limited number of queries running on these tables, so creating additional indexes may be the way to go. However, as I noted I'm running this on an SSD. The table size is 836MB, so I would have thought that this could be searched more quickly, but you could be right. I just want to make sure that I'm investigating all avenues of verifying that my query and structure are correct. — Nick Coons, Mar 05 '15 at 06:26
temperature_id and data_id are BIGINTs, which take 8 bytes each. Probably INT UNSIGNED would work -- 4 bytes each; range of 0..4 billion. Smaller --> faster. Your table is less than 1GB; possibly it was entirely cached, hence we are talking about CPU time, not SSD speed. — Rick James, Mar 05 '15 at 06:42
I normally use INT instead of BIGINT for most applications, but in this case the tables are likely to grow beyond the confines of 4.2 billion records. I'm going to try your idea of covering indexes to see if that improves the situation. Thanks again! — Nick Coons, Mar 05 '15 at 06:44

Simple SELECT query is slow with large table

1 Answers1