0

Hello this is my query and is taking around an hour to complete. The table is 7 million records in innodb. I have an index on (timeStamp,symbol). Test7() is a function for coming up with previous week day. I am using mysql 5.3. Any suggestions on how to speed this up? Thank you

'''''''''''

SELECT Archive_dataPolygon.timeStamp,
Archive_dataPolygon.symbol,
Archive_dataPolygon.open,
Archive_dataPolygon.high,
Archive_dataPolygon.low,
Archive_dataPolygon.close,
Archive_dataPolygon.volume,
Archive_dataPolygon.afterHours,
Archive_dataPolygon.preMarket,
d2.timeStamp,
d2.symbol,
d2.open,
d2.high,
d2.low,
d2.close,
d2.volume,
d2.afterHours,
d2.preMarket,
d3.timeStamp,
d3.symbol,
d3.open,
d3.high,
d3.low,
d3.close,
d3.volume,
d3.afterHours,
d3.preMarket,
d4.timeStamp,
d4.symbol,
d4.open,
d4.high,
d4.low,
d4.close,
d4.volume,
d4.afterHours,
d4.preMarket,
round((Archive_dataPolygon.high-Archive_dataPolygon.open)/Archive_dataPolygon.open,3) as 'quoteChangePercent'

from Archive_dataPolygon
inner join Archive_dataPolygon d2 force index(timeStampSymbol) on d2.symbol=Archive_dataPolygon.symbol and d2.timeStamp = test7(Archive_dataPolygon.timeStamp)
inner join Archive_dataPolygon d3 on d3.symbol=Archive_dataPolygon.symbol and d3.timeStamp = test7(d2.timeStamp)
inner join Archive_dataPolygon d4 on d4.symbol=Archive_dataPolygon.symbol and d4.timeStamp = test7(d3.timeStamp)

where Archive_dataPolygon.timeStamp < NOW() - interval 4 day and Archive_dataPolygon.timeStamp > NOW() - INTERVAL 20000 DAY and Archive_dataPolygon.open >= 50 and Archive_dataPolygon.open <= 70 and Archive_dataPolygon.symbol not IN("%Nasdaq%") and round((Archive_dataPolygon.high-Archive_dataPolygon.open)/Archive_dataPolygon.open,3)=%quoteChangePercent%


HAVING isnull(Archive_dataPolygon.timeStamp)=false and isnull(d2.timeStamp)=false and isnull(d3.timeStamp)=false and isnull(d4.timeStamp)=false

order by Archive_dataPolygon.timeStamp desc
LIMIT 2000

'''''''''''''''

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 1
    Functions cannot use indexes, so whatever `test7()` is, it's ruining performance (assuming that `timestamp` is indexed). `IN("%Nasdaq%")` cannot use an index; this would appear to be a flaw in your data model. And `%quoteChangePercent%` looks like an error. – Strawberry Jan 16 '21 at 08:25
  • 1
    `HAVING` is typically reserved for aggregate results, so I'm suspicious of its use here. `WHERE... IS NOT NULL` would be more usual. – Strawberry Jan 16 '21 at 08:27
  • And while it makes no difference to performance, there's no harm in aliasing ALL table instances. – Strawberry Jan 16 '21 at 08:29
  • And... your result set is difficult to use, as distinguishing between d2, d3, etc is tricky – Strawberry Jan 16 '21 at 08:31

1 Answers1

0

Instead of

d2.timeStamp = test7(Archive_dataPolygon.timeStamp)

use

Archive_dataPolygon.timeStamp = d2.timeStamp - INTERVAL 7 DAY

or

d2.timeStamp = Archive_dataPolygon.timeStamp + INTERVAL 7 DAY

(Try each way; I can't easily tell which would be better.)

I hope that timeStamp is of type "DATE". Please provide SHOW CREATE TABLE for each table.

There is no MySQL 5.3; perhaps you meant MariaDB 5.3? Provide SHOW VARIABLES LIKE 'version';

This does no filtering, correct? If not, leave it out:

Archive_dataPolygon.timeStamp > NOW() - INTERVAL 20000 DAY

What is this?

HAVING  isnull(d2.timeStamp)=false

It could (should) be simplified to

HAVING  d2.timeStamp IS NOT NULL

At that point, the test can be put into the WHERE clause (since you are using JOIN and not LEFT JOIN).

If you intended to have LEFT JOIN please elaborate on your goals.

This seems strange:

order by  Archive_dataPolygon.timeStamp desc LIMIT  2000

You want the very oldest records, regardless of symbol?

Normally I would recommend this order for the index: (symbol, timeStamp). In general, it is more optimal to use column(s) that are tested with = first in any composite index. [However, it may not help performance for this query.] Include two indexes.

The real killer is probably test7(). I hope that after my other suggestions, it will touch only 2000 rows -- times 4 (due to the JOINs).

If you still need help after my suggestions, please provide EXPLAIN SELECT.

Rick James
  • 135,179
  • 13
  • 127
  • 222