0

hope you will allow me to pick your brains so I can gain some knowledge in the process. We have 3 tables - data_product, data_issuer, data_accountbalance

CREATE TABLE `data_issuer` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`issuer_name` varchar(128) NOT NULL
PRIMARY KEY (`id`)
) ENGINE=InnoDB


CREATE TABLE `data_product` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
`issuer_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `data_product_name_issuer_id_260fec65_uniq` (`name`,`issuer_id`),
KEY `data_product_issuer_id_d07fa696_fk_data_issuer_id` (`issuer_id`),
CONSTRAINT `data_product_issuer_id_d07fa696_fk_data_issuer_id` FOREIGN KEY 
(`issuer_id`) REFERENCES `data_issuer` (`id`)
) ENGINE=InnoDB


CREATE TABLE `data_accountbalance` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`date` date NOT NULL,
`nominee_name` varchar(128) NOT NULL,
`beneficiary_name` varchar(128) NOT NULL,
`nominee_id` varchar(128) NOT NULL,
`account_id` varchar(16) NOT NULL,
`product_id` int(11) NOT NULL,
`register_id` int(11) DEFAULT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `data_accountbalance_date_product_id_nominee__7b8d2c6a_uniq` (`date`,`product_id`,`nominee_id`,`beneficiary_name`),
 KEY `data_accountbalance_product_id_nominee_id_date_8ef8754f_idx` (`product_id`,`nominee_id`,`date`),
 KEY `data_accountbalance_register_id_4e78ec16_fk_data_register_id` (`register_id`),
 KEY `data_accountbalance_product_id_date_nominee_i_c3a41e39_idx` (`product_id`,`date`,`nominee_id`,`beneficiary_name`,`balance_amount`),
 CONSTRAINT `data_accountbalance_product_id_acfb18f6_fk_data_product_id` FOREIGN KEY (`product_id`) REFERENCES `data_product` (`id`),
 CONSTRAINT `data_accountbalance_register_id_4e78ec16_fk_data_register_id` FOREIGN KEY (`register_id`) REFERENCES `data_register` (`id`)
 ) ENGINE=InnoDB

When running the query below, the system takes about an hour to respond -

SELECT SQL_NO_CACHE *
from data_product
INNER JOIN `data_issuer` ON (`data_issuer`.`id` = `data_product`.`issuer_id`)
INNER JOIN `data_accountbalance` ON (`data_accountbalance`.`product_id` = `data_product`.`id`)
LIMIT 100000000;

Both data_issuer and data_product only have few 100 records in them, but the data_accountbalance is huge with about 15,384,358 records.

The explain plan produced is below -

# id     select_type     table   partitions  type    possible_keys   key     key_len     ref     rows    filtered    Extra
1    SIMPLE  data_product        ALL    PRIMARY,data_product_issuer_id_d07fa696_fk_data_issuer_id               459 100 
1    SIMPLE  data_issuer         eq_ref PRIMARY PRIMARY 4   pnl.data_product.issuer_id  1   100 
1    SIMPLE  data_accountbalance         ref    data_accountbalance_product_id_nominee_id_date_8ef8754f_idx,data_accountbalance_product_id_date_nominee_i_c3a41e39_idx  data_accountbalance_product_id_date_nominee_i_c3a41e39_idx  4   pnl.data_product.id 493 100 

Can someone help tune the query so it does not take an hour to run please? Appreciate any pointers you might have for me.

Gayathri
  • 274
  • 3
  • 17
  • I think if you create index on data_accountbalance`.`product_id` column it will be optimized – Rams Jan 10 '18 at 17:47
  • @Rams - I tried your suggestion but no joy am afraid. The query still takes the same time. – Gayathri Jan 10 '18 at 21:24
  • How many rows does it return? – Rick James Jan 11 '18 at 00:29
  • @RickJames - 15,402,355 rows are returned by the query – Gayathri Jan 11 '18 at 08:58
  • @Gayathri - What will you do with that much data all at once? Will it choke the client? Do you need to run this more than once? If so, why? Maybe there is some way to solve the _real_ goal without having to repeatedly run an hour-long query. – Rick James Jan 11 '18 at 09:05
  • @RickJames - As I mentioned to Dave below, the query extracts data for a report. The Business team then use the extract to make insights. – Gayathri Jan 11 '18 at 10:00
  • Can't you do the summarization in the `SELECT`?? – Rick James Jan 11 '18 at 14:31
  • @RickJames - Tableau allows the Business to summarise and analyse data in different ways. Rather than come to us every time, the Business simply use the extract and perform varied summaries and this is one of the main advantages of Tableau. Our job stops with providing them the different extracts – Gayathri Jan 12 '18 at 09:48
  • @Gayathri - Sounds like Tableau would work better from a file instead of a database. – Rick James Jan 12 '18 at 13:07
  • @RickJames - it's still going to involve an extract from the db isn't it? – Gayathri Jan 12 '18 at 16:46
  • @Gayathri not if it is a file. I'm talking about getting rid of the database, since you don't seem to need the relational properties it provides. – Rick James Jan 12 '18 at 18:11

1 Answers1

0

If your query is literally what you are showing there... Then thats the problem. It has no WHERE clause.

That query would literally return 15,384,358 results. As the two smaller tables are typical domain tables with NOT NULL relations all the way across, it will return 1 to 1 results for every row in data_accountbalance.

The actual time cost will probably be in creating a Massive temp table (tho I'm not sure about that). Just to download the entire database, all 3 tables, you could look into optimize your temp table MySQL config to possibly speed this up, OR preferably make it so that when you start executing the query that you can read the results as MySQL gets them ready (avoids a temp table). Alternatively, maybe your script that runs this query is trying to read the whole data set into memory, which takes a long time?

Is there a particular reason to download All the data? Usually you just download the data you are meaning to operate on. Or have MySQL do the grouping, summing, etc then return the answer you wanted based on All the data.

How many rows did you expect the query to return? If you are thinking something less than 15 million, then the answer is to add some kind of WHERE statement, or an aggregate function. Depending on what table and column in you use to reduce the result set, those columns will have to be indexed.

I hope this helps. :)

  • Thank you Dave for your thoughts. The query is an extract for a Tableau report and is as is (no WHERE). Unfortunately, I need to optimise it as seen. Any other ideas on how to do it? – Gayathri Jan 10 '18 at 20:47
  • Yes, if you are running a report, the goal ultimately is to condense a Lot of data, collated and summed into a small amount that can be digested by a human. E.g. you want to see the number of nominees per data issuer (GROUP BY data_issuer.id), or by product and month (GROUP BY data_product.id, DATE_FORMAT('%Y-%m', data_accountbalance.date)). The point is you want MySQL to do the data crunching Before it sends the data off to the reporting software. This just means that your reporting software is only using MySQL as a data-hold, not using it's query abilities. Does this help at all? – Dave Buchanan Jan 11 '18 at 16:53
  • Are you sure the hour delay is MySQL _returning_ the data, not your Tableau reporting software _parsing and crunching_ that much data? – Dave Buchanan Jan 11 '18 at 17:04
  • Tableau allows the Business to summarise and analyse data in different ways. Rather than come to us every time, the Business simply use the extract and perform varied summaries and this is one of the main advantages of Tableau. Our job stops with providing them the different extracts. Yes, the delay is in running the extract before any summarisation by the Business team – Gayathri Jan 12 '18 at 09:47
  • I've worked with other reporting software (CrystalReports) where it was adding WHERE and GROUP BY clauses behind the scenes, so that the DB server was doing the hard work. Maybe you can configure Tableau to also do this? Otherwise, next place to look might be bandwidth. Are you able to watch how much data is being transferred during that hour, and the total data size for those tables? Maybe speeding up the connection from DB to Tableau will help? However, if you can see from the network that nothing is being send until the very end of that hour delay, then that isn't the problem. – Dave Buchanan Jan 12 '18 at 17:23