0

I have two tables and the join between them takes forever. Created relevant index but apparently its not being used is what my guess is.

Table 1:

CREATE TABLE `INTRADAY_PRICES_CASH` (
`TradingSymbol` varchar(100) CHARACTER SET latin1 NOT NULL,
`SnapshotDateTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE 
CURRENT_TIMESTAMP,
`Open` double NOT NULL,
`Low` double NOT NULL,
`High` double NOT NULL,
`Close` double NOT NULL,
`Volume` double NOT NULL,
`SnapshotDate` date NOT NULL,
`SnapshotTime` time NOT NULL,
 `UpdateToDBTime` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 PRIMARY KEY (`TradingSymbol`,`SnapshotDateTime`),
 KEY `IDX_SNAPSHOTDATE` (`SnapshotDate`),
 KEY `IDX_SNAPSHOTDATETIME` (`SnapshotDateTime`),
 KEY `IDX_SNAPSHOTTIME` (`SnapshotTime`),
 KEY `IDX_TRADINGSYMBOL` (`TradingSymbol`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Table 2:

CREATE TABLE `ACTIVE_INSTRUMENTS_CASH` (
  `INSTRUMENT_ID` bigint(20) NOT NULL AUTO_INCREMENT,
  `INSTRUMENT_TOKEN` bigint(20) DEFAULT NULL,
  `EXCHANGE_TOKEN` bigint(20) DEFAULT NULL,
  `TRADING_SYMBOL` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
  `INSTRUMENT_NAME` varchar(500) COLLATE utf8_unicode_ci DEFAULT NULL,
  `EXPIRY` date DEFAULT NULL,
  `LOT_SIZE` double DEFAULT NULL,
  `TICK_SIZE` float DEFAULT NULL,
  `INSTRUMENT_TYPE` varbinary(10) DEFAULT NULL,
  `SEGMENT` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
  `EXCHANGE` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
  `AS_ON_DATE` date NOT NULL,
  PRIMARY KEY (`INSTRUMENT_ID`),
  UNIQUE KEY `IND_AS_ON_DATE` (`AS_ON_DATE`,`TRADING_SYMBOL`),
  KEY `IND1` (`AS_ON_DATE`),
  KEY `IND2` (`INSTRUMENT_TOKEN`),
  KEY `IND3` (`TRADING_SYMBOL`),
  KEY `IND4` (`INSTRUMENT_TYPE`)
) ENGINE=InnoDB AUTO_INCREMENT=196606 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Table 2 has TRADING_SYMBOL and AS_ON_DATE as PK. For one date, there could be multiple trading symbols in this table.

In Table 1, for that same trading symbol and date combination, we will have multiple rows that represent various prices of the symbol on various minutes of the same date.

Now, I want to join these tables to understand how many rows are matching for the trading symbol and date combination in both the tables.

SELECT COUNT(*) FROM INTRADAY_PRICES_CASH C, ACTIVE_INSTRUMENTS_CASH I
WHERE C.`SnapshotDate`>'2017-08-14'
AND I.`TRADING_SYMBOL`=C.`TradingSymbol`
AND I.`AS_ON_DATE`=C.`SnapshotDate`

Explain shows that it is using:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  I   range   IND_AS_ON_DATE,IND1,IND3    IND_AS_ON_DATE  3   \N  15066   Using where; Using index
1   SIMPLE  C   ref IDX_SNAPSHOTDATE    IDX_SNAPSHOTDATE    3   u754793479_stock.I.AS_ON_DATE   771 Using where; Using index

Strangely I dont see it using any of the indices in both the tables pertaining to TradingSymbol. That is possibly causing this to be delayed.

Is that understanding correct and if so, how to rectify so that it uses the index pertaining to trading symbol. Do I need a full text index on trading symbol for it to use this.

EDIT

Here are some additional clarifications:

INTRADAY_PRICES_CASH contains 69700675 rows. ACTIVE_INSTRUMENTS_CASH contains 190177 rows.

SELECT COUNT(*) FROM INTRADAY_PRICES_CASH C WHERE C.SnapshotDate>'2017-08-14' returns 3911679 rows

My hosting provider has not provided access to innodb_index_stats. Hence cannot fire this query: SELECT *, stat_value * @@innodb_page_size FROM mysql.innodb_index_stats WHERE table_name = 'INTRADAY_PRICES_CASH' and stat_name = 'size' and indexname = 'IDX_SNAPSHOTDATETRADINGSYMBOL';

select @@innodb_buffer_pool_size is 133.2 G

I have 16GB RAM.

Kallol
  • 264
  • 1
  • 12
  • You need the additional index `INTRADAY_PRICES_CASH(SnapshotDate, TradingSymbol)` or `ACTIVE_INSTRUMENTS_CASH(TRADING_SYMBOL, AS_ON_DATE)`. The order of the columns matters, and you currently have a different for your 2-column-indexes in your two tables. – Solarflare Aug 25 '17 at 11:57
  • Thanks. Added both. Though explain shows it is using INTRADAY_PRICES_CASH(SnapshotDate, TradingSymbol), still its taking 36 minutes. These are the index: `KEY `IND5` (`TRADING_SYMBOL`,`AS_ON_DATE`)` and `KEY `IDX_SNAPSHOTDATETRADINGSYMBOL` (`SnapshotDate`,`TradingSymbol`)`. Explain is showing: table type possible_keys key I range IND_AS_ON_DATE,IND1,IND3,IND5 IND_AS_ON_DATE C ref IDX_SNAPSHOTDATE,IDX_SNAPSHOTDATETRADINGSYMBOL IDX_SNAPSHOTDATETRADINGSYMBOL – Kallol Aug 25 '17 at 13:54
  • What is the count you are getting and how many rows do you have in your tables? This is a very simple query that can be completely calculated from indexes, it should not take 36 minutes even if you would have a billion rows in your tables. What does `TradingSymbol` contain/does it contain more likely 10 or more likely 100 characters? Replacing it with an int that represents that symbol could improve the speed (relatively to the average length of your symbols). – Solarflare Aug 25 '17 at 14:15
  • INTRADAY_PRICES_CASH contains 69700675 rows. ACTIVE_INSTRUMENTS_CASH contains 190177 rows. I will check the charset and come back. TradingSymbol contains characters representing stock symbols...like APPLE, AMAZON, etc. – Kallol Aug 25 '17 at 14:23
  • AH. I just saw what is causing this. Both `tradingSymbol` and `TRADING_SYMBOL` need to have the same characterset / collation, so replace one (you should probably use utf8 for both to be most flexible). More preferable, as mentioned, would be to replace it with an int, but fixing the characterset is easier and should already result in a big improvement. – Solarflare Aug 25 '17 at 14:29
  • 5 minutes still seem much. What do you get for `SELECT COUNT(*) FROM INTRADAY_PRICES_CASH C WHERE C.`SnapshotDate`>'2017-08-14'`? What do you get for `SELECT *, stat_value * @@innodb_page_size FROM mysql.innodb_index_stats WHERE table_name = 'INTRADAY_PRICES_CASH' and stat_name = 'size' and indexname = 'IDX_SNAPSHOTDATETRADINGSYMBOL';`? What is, as Rick suggested, `select @@innodb_buffer_pool_size;`? For readability, please add these to your question (not in the comments), also the new `explain`. – Solarflare Aug 26 '17 at 07:24

1 Answers1

0

Please pick better aliases -- both tables have initial C and I.

Please use the preferred JOIN...ON.

SELECT  COUNT(*)
    FROM  INTRADAY_PRICES_CASH AS p
    JOIN  ACTIVE_INSTRUMENTS_CASH AS a
          ON  a.`TRADING_SYMBOL` = p.`TradingSymbol`
         AND  a.`AS_ON_DATE`     = p.`SnapshotDate` 
    WHERE  p.`SnapshotDate`>'2017-08-14'

Notice how key_len is only 3, implying that it is only using the DATE part of the index.

Use the same CHARACTER SET and COLLATION for TRADING_SYMBOL and TradingSymbol. (Suggest you spell them the same, too.) Different collations prevents use of indexes (for that column).

Wouldn't this give you the same COUNT(*)?

SELECT  COUNT(*)
    FROM  INTRADAY_PRICES_CASH
    WHERE  p.`SnapshotDate`>'2017-08-14';

If you are starting on the 15th, why not say >= ... 15 instead of > ... 14? This would then work for DATETIME as well as DATE.

Does some other table use INSTRUMENT_ID? Or can that column be removed and promote the UNIQUE key to PRIMARY?

If you really do need the join, then I may have other index suggestions.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Thanks for your pointers to make this better. The count from `INTRADAY_PRICES_CASH` gives the same count. But I want to make sure I have my downstream queries working on only the set from `INTRADAY_PRICES_CASH` that have the same representation in `ACTIVE_INSTRUMENTS_CASH`. I have changed the collation as suggested by @Solarflare. Performance has imporved but it still takes 5 minutes for this to run – Kallol Aug 26 '17 at 04:05
  • How many rows in `INTRADAY_PRICES_CASH`? What is the resulting `COUNT(*)`? What is the value of `innodb_buffer_pool_size`? How much RAM do you have? – Rick James Aug 26 '17 at 05:28