2

please I have the same problem as I found here

MySQL - Selecting data from multiple tables all with same structure but different data ,

I have to select data from many MySQL tables with identical structure, but different data (split up into table_0, table_1, table_2 etc to table_5 to distribute millions of records of data).

The hardware generating the data records for each device moves from table to table according to timestamp field, which is NOT unique. e.g. 50 records in table_0 may have the same timestamp. When the data gets to the end of table_5, it goes back to table_0 to start overwriting the data there. I need to get the data on each device within a time range.

Each table's data columns (for table_0, table_1... up to table_5):
timestamp, robotGroupID, robotID, sensor1, sensor2, sensor3, ... (many of them)

However the tables are HUGE and the UNION ALL (I read its faster than DISTINCT) takes forever to execute, even with just two tables let alone 6. e.g. I will illustrate for two tables below.

MySQL statement in PHP: (illustrated for just sensor 1, sensor 2 and sensor 3)
(SELECT sensor1, sensor2, sensor3 FROM table_0 WHERE robotID=".$robotID." AND timestamp BETWEEN ".$timeStampStart." AND ".$timeStampStop)
UNION ALL
(SELECT sensor1, sensor2, sensor3 FROM table_1 WHERE robotID=".$robotID." AND timestamp BETWEEN ".$timeStampStart." AND ".$timeStampStop)

N.B it is the exact same query except for the table name. Sensor data for a robot within a time range may span none, one, or more of the tables at once.

I cannot use LIMIT because the number of reports from robots within each time range cannot be known ahead of time. I cant use the MERGE STORAGE ENGINE cos I only have read-only access to the company's database.

I have an idea to use count(robotID) or so on each table to check before running queries but Im not sure how to go about this cos I'm quite a novice.

Please how do you think I can make this work faster for 6 tables and many more columns since there are many more columns than illustrated? Thanks in advance!

Community
  • 1
  • 1
Cogicero
  • 1,514
  • 2
  • 17
  • 36
  • is timestamp indexed? could you provide the EXPLAIN of the query posted – ejrowley Sep 03 '10 at 10:44
  • Thanks, ejrowley. I didnt know what an EXPLAIN is but I googled it and did what I found there. Is this what you require? N.B. The tables have no primary keys. ------------------------------------------------------------------- table: table_0 | type: range | possible_keys: IDX_TIME_UQ | key: IDX_TIME_UQ | key_len ref: 8 | rows: 488590 | Extra: using where ------------------------------------------------------------------- table: table_1 | type: range | possible_keys: IDX_TIME_UQ | key: IDX_TIME_UQ | key_len ref: 8 | rows: 675489 | Extra: using where – Cogicero Sep 03 '10 at 11:17
  • How many results are you expecting from the query, are the lots of different robotID's or just a few. Also how quick, relatively speaking is the query when just run on table_1. – ejrowley Sep 03 '10 at 12:21
  • @ejrwoley: The number of results from the Query depends on the time range selected because each sensor has a different value each minute (the greater the difference between timeStart and timeStop, the greater the number of results). There are many robotIDs, approx 500. This entire query is in a for loop to poll the data for all the robots. – Cogicero Sep 03 '10 at 12:28
  • By the way, I just ran the queries on all 6 tables using UNION ALL and the script hung until I set PHP time limit to zero, and memory limit to 1Gb! For each table, I did a mysql count query to check if there are rows, and only if there were, then to fetch the columns. I ran the query for just ONE robot in the for loop and it took 25 minutes. Please are there any more ideas to make this faster? – Cogicero Sep 03 '10 at 12:33
  • @ejrowley: Sorry I mispelt your name previously. When run on only table_1 for only one robot, the time it takes can vary from 2 seconds (no data found in table_1) to as much as a minute. Those values rise greatly when more tables are added. – Cogicero Sep 03 '10 at 12:37
  • I tried out some ideas and its much faster now so Im answering my own question. Thanks for your help! – Cogicero Sep 03 '10 at 13:22

3 Answers3

1

Are the fields RobotID and Timestamp indexed?

I would add a multi-field index of ( RobotId, timestamp ) at the very least.

You say you have read only access to the tables, so can you request this index to be added? I'm sure it will help in both your original and updated queries posted.

Dave Rix
  • 1,621
  • 1
  • 11
  • 17
  • Thanks a lot Dave Rix. But, no, many companies use this same database so my company's request was flatly dismissed. I wish I could add such an index! Thanks :) – Cogicero Sep 03 '10 at 14:16
0

If you can convince them to let you change the database structure, you can GREATLY optimize the layout of your database with the help of MySQL Partitioning. You'll want to research "Range Partitioning", and set up partitioning rules that will tell MySQL to automatically sort your data into invisible subtables for way quicker SELECT results. You won't even need multiple tables.

See http://dev.mysql.com/doc/refman/5.1/en/partitioning-overview.html

adavea
  • 1,535
  • 1
  • 19
  • 25
0

I must confess Im still a novice PHP/MySQL coder, but with many ideas; so my code is probably "dirty".

So I solved the problem this way in order to move forward, but please better solutions are welcome. As for any strange syntax, I am using a database class built upon the PHP PDO because I am using many different RBDMS types on this project.

For the $myQuery_start variable, I added the names of the other columns as well as sensors 1 to 3.
$myQuery_start = "(SELECT sensor1, sensor2, sensor3 FROM ";
$myQueryCount_start = "(SELECT COUNT(*) FROM ";
$myQuery_stop = " WHERE robotID=".$robotID." AND timestamp BETWEEN ".$timeStampStart." AND ".$timeStampStop.")";

$count_0 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_0".$myQuery_stop)->fetchColumn();
$count_1 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_1".$myQuery_stop)->fetchColumn();
$count_2 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_2".$myQuery_stop)->fetchColumn();
$count_3 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_3".$myQuery_stop)->fetchColumn();
$count_4 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_4".$myQuery_stop)->fetchColumn();
$count_5 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_5".$myQuery_stop)->fetchColumn();

And now I check to see if UNION ALL needs to be appended to each table's query or not. No need to have a UNION ALL if there is no data record to attach in the next table.
$union_0 = (($count_1 + $count_2 + $count_3 + $count_4 + $count_5) > 0)?" UNION ALL ":"";
$union_1 = (($count_2 + $count_3 + $count_4 + $count_5) > 0)?" UNION ALL ":"";
$union_2 = (($count_3 + $count_4 + $count_5) > 0)?" UNION ALL ":"";
$union_3 = (($count_4 + $count_5) > 0)?" UNION ALL ":"";
$union_4 = (($count_5) > 0)?" UNION ALL ":"";

and now we build up the table queries and combine to form the full query

$query_0 = ($count_0 > 0)?$myQuery_start."ip_minute_stats_0".$myQuery_stop.$union_0:"";
$query_1 = ($count_1 > 0)?$myQuery_start."ip_minute_stats_1".$myQuery_stop.$union_1:"";
$query_2 = ($count_2 > 0)?$myQuery_start."ip_minute_stats_2".$myQuery_stop.$union_2:"";
$query_3 = ($count_3 > 0)?$myQuery_start."ip_minute_stats_3".$myQuery_stop.$union_3:"";
$query_4 = ($count_4 > 0)?$myQuery_start."ip_minute_stats_4".$myQuery_stop.$union_4:"";
$query_5 = ($count_5 > 0)?$myQuery_start."ip_minute_stats_5".$myQuery_stop:"";

Then concatenated:
$myQuery = $query_0.$query_1.$query_2.$query_3.$query_4.$query_5;
And finally $myQuery is executed to produce all the data as required.

At least this is roughly 8 times faster than the previous way I used UNION ALL, so I think this is valid. Any suggested further optimization?

Cogicero
  • 1,514
  • 2
  • 17
  • 36
  • Please, more answers (or edits on the above answer) are welcome. My Boss still thinks this runs too slowly :(.I want to know I have tried my best before I move on. Thanks. – Cogicero Sep 03 '10 at 18:04