1

The table 'reading' contains readings taken every 40s, for today. The query returns averages for 180s periods. 'time_stamp' is indexed. The query below returns a reasonable number of rows (a few hundred) but visits ALL rows and get slower the bigger the table gets. WHERE clause does not seem to be restricting it to today's rows only.

EXPLAIN SELECT DATE_FORMAT(time_stamp, '%Y-%m-%dT%T+00:00') , AVG(temp_c) FROM reading WHERE DATE(time_stamp) = CURDATE() GROUP BY round(UNIX_TIMESTAMP(time_stamp) / 180)

Table schema: CREATE TABLE reading ( id bigint(20) NOT NULL AUTO_INCREMENT, time_stamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, temp_c float NOT NULL, pressure_hpa float NOT NULL, wind_speed_kt int(11) NOT NULL, wind_dir_degree int(11) NOT NULL, rain_mm float NOT NULL, rain_day_mm float NOT NULL, wind_gust_kt int(11) NOT NULL, humidity float DEFAULT NULL, PRIMARY KEY (id), KEY time_stamp (time_stamp), KEY time_stamp_idx (time_stamp) ) ENGINE=InnoDB AUTO_INCREMENT=1747097 DEFAULT CHARSET=latin1;

Sample data

quanglewangle
  • 581
  • 1
  • 5
  • 14
  • The WHERE goes before the GROUP BY. You could verify this by removing the WHERE (or perhaps by the AVG value) :) – SAS Jun 05 '17 at 11:00
  • Also, the query seems to select all the current day's records so could get measurably slower in the afternoons. – SAS Jun 05 '17 at 11:01
  • Could you please let me know what `round(UNIX_TIMESTAMP(time_stamp) / 180)` does? If you apply `UNIX_TIMESTAMP` function to time_stamp filed, it doesn't use index. – harshavmb Jun 05 '17 at 11:14
  • @harshavmb `round(UNIX_TIMESTAMP(time_stamp) / 180)` makes time periods off 180 seconds. – Raymond Nijland Jun 05 '17 at 11:20
  • @quanglewangle can you post the table schema and some example data? – Raymond Nijland Jun 05 '17 at 11:21
  • I guess mysql is not able to use the index due to the calcutaions. If possible, try adding columns holding the computed values precalculated on insert, so you don't have to do it in the select. – SAS Jun 05 '17 at 11:33
  • Added schema and sample data – quanglewangle Jun 05 '17 at 11:38
  • Could you please run `EXPLAIN SELECT DATE_FORMAT(time_stamp, '%Y-%m-%dT%T+00:00') , AVG(temp_c) FROM reading WHERE DATE(time_stamp) = CURDATE() GROUP BY DATE_SUB(time_stamp, INTERVAL 180 SECOND)`? Any improvements in the execution plan? – harshavmb Jun 05 '17 at 11:41
  • # id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra '1', 'SIMPLE', 'reading', 'ALL', NULL, NULL, NULL, NULL, '15044', 'Using where; Using temporary; Using filesort' So no difference @harshavmb Same number of rows visited – quanglewangle Jun 05 '17 at 11:52
  • yeah, I too realised. `DATE_SUB` is also doing the samething as `UNIX_TIMESTAMP`. Will have to use between or some other technique. Will let you know – harshavmb Jun 05 '17 at 11:59
  • Can you try this please? `EXPLAIN SELECT DATE_FORMAT(time_stamp, '%Y-%m-%dT%T+00:00') , AVG(temp_c) FROM reading WHERE time_stamp >= '2017-06-05 00:00:00' GROUP BY round(UNIX_TIMESTAMP(time_stamp) / 180)`. I expect some improvement. – harshavmb Jun 05 '17 at 12:37
  • Yup, @harshavmb, that does it! It uses the time_stamp as key and only visits 8-900 rows instead of 15000 or so. Replacing'2017-06-05 00:00:00' in the WHERE clause with CURDATE() also works. It looks like the fix lies in using "time_stamp >=" instead of "DATE(time_stamp)" in the WHERE clause allows the index to be used. – quanglewangle Jun 05 '17 at 13:54
  • Glad that it helped you.. :) We can tweak more by modifying `round(UNIX_TIMESTAMP(time_stamp) / 180)`. But, am not sure how to do it. Shall I write the same as an answer? – harshavmb Jun 05 '17 at 14:01
  • yes please and I will accept it – quanglewangle Jun 05 '17 at 14:03

1 Answers1

2
EXPLAIN SELECT 
    DATE_FORMAT(time_stamp, '%Y-%m-%dT%T+00:00') ,
    AVG(temp_c) 
FROM reading
WHERE DATE(time_stamp) = CURDATE() 
GROUP BY round(UNIX_TIMESTAMP(time_stamp) / 180) 

When the above query is executed, MySQL optimizer isn't interested in index scan (could be because of cost factor) rather full table scan is initiated and the issue appears to be because of WHERE DATE(time_stamp) = CURDATE().

Having changed your where clause to time_stamp >= CURDATE(), I've seen index being used and less number of rows were fetched shunning full scan.

Hence, your final query will be:

EXPLAIN SELECT 
    DATE_FORMAT(time_stamp, '%Y-%m-%dT%T+00:00') ,
    AVG(temp_c) 
FROM reading
WHERE time_stamp >= CURDATE() 
GROUP BY round(UNIX_TIMESTAMP(time_stamp) / 180);

I suspect date(time_stamp) isn't that efficient with index. Similar topic was discussed here (see ypercube's answer).

The above query can be further improved by choosing an alternate of round(UNIX_TIMESTAMP(time_stamp) / 180) as UNIX_TIMESTAMP(timestamp) doesn't use index. But, I'm not trying furthermore.

Hope this helps!

harshavmb
  • 3,404
  • 3
  • 21
  • 55