4

The current implementation is a single complex query with multiple joins and temporary tables, but is putting too much stress on my MySQL and is taking upwards of 30+ seconds to load the table. The data is retrieved by PHP via a JavaScript Ajax call and displayed on a webpage. Here is the tables involved:

Table: table_companies
Columns: company_id, ...

Table: table_manufacture_line
Columns: line_id, line_name, ...

Table: table_product_stereo
Columns: product_id, line_id, company_id, assembly_datetime, serial_number, ...

Table: table_product_television
Columns: product_id, line_id, company_id, assembly_datetime, serial_number, warranty_expiry, ...

A single company can have 100k+ items split between the two product tables. The product tables are unioned and filtered by the line_name, then ordered by assembly_datetime and limited depending on the paging. The datetime value is also reliant on timezone and this is applied as part of the query (another JOIN + temp table). line_name is also one of the returned columns.

I was thinking of splitting the line_name filter out from the product union query. Essentially I'd determine the ids of the lines that correspond to the filter, then do a UNION query with a WHERE condition WHERE line_id IN (<results from previous query>). This would cut out the need for joins and temp tables, and I can apply the line_name to line_id and timezone modification in PHP, but I'm not sure this is the best way to go about things.

I have also looked at potentially using Redis, but the large number of individual products is leading to a similarly long wait time when pushing all of the data to Redis via PHP (20-30 seconds), even if it is just pulled in directly from the product tables.

  • Is it possible to tweak the existing queries to increase the efficiency?
  • Can I push some of the handling to PHP to decrease the load on the SQL server? What about Redis?
  • Is there a way to architect the tables better?
  • What other solution(s) would you suggest?

I appreciate any input you can provide.

Edit:

Existing query:

SELECT line_name,CONVERT_TZ(datetime,'UTC',timezone) datetime,... FROM (SELECT line_name,datetime,... FROM ((SELECT line_id,assembly_datetime datetime,... FROM table_product_stereos WHERE company_id=# ) UNION (SELECT line_id,assembly_datetime datetime,... FROM table_product_televisions WHERE company_id=# )) AS union_products INNER JOIN table_manufacture_line USING (line_id)) AS products INNER JOIN (SELECT timezone FROM table_companies WHERE company_id=# ) AS tz ORDER BY datetime DESC LIMIT 0,100

Here it is formatted for some readability.

SELECT line_name,CONVERT_TZ(datetime,'UTC',tz.timezone) datetime,... 
  FROM (SELECT line_name,datetime,... 
          FROM (SELECT line_id,assembly_datetime datetime,... 
                    FROM table_product_stereos WHERE company_id=# 

                 UNION 
                SELECT line_id,assembly_datetime datetime,... 
                  FROM table_product_televisions 
                 WHERE company_id=# 
               ) AS union_products 
         INNER JOIN table_manufacture_line USING (line_id)
        ) AS products 
INNER JOIN (SELECT timezone 
            FROM table_companies 
            WHERE company_id=# 
            ) AS tz 
ORDER BY datetime DESC LIMIT 0,100

IDs are indexed; Primary keys are the first key for each column.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
DemonGyro
  • 483
  • 3
  • 11
  • 1
    Please show your queries. – O. Jones Oct 02 '14 at 21:05
  • And please provide information about primary keys and indexes. – TheWolf Oct 02 '14 at 21:09
  • 1
    I'm not completely sure, but I think your query is way overcomplex. What exactly is your query supposed to return? Also, could you please format the query so it's more readible? – TheWolf Oct 02 '14 at 21:36
  • 1
    Try `UNION ALL` instead of `UNION`, to skip the deduplicating work inherent in UNION. – O. Jones Oct 02 '14 at 22:22
  • Is there any reason why you have the two products in different tables rather than a single table with a product type? – Mike Oct 02 '14 at 22:42
  • There are more fields excluded that are specific to the tv and stereo tables, pretty much everything in the ... part are different values, and there are a number of them. Also, in many cases, we want to grab tvs separately from stereos. I could argue for the combination of the tables, but the main argument against this is having 2 smaller tables is better than having one large table. – DemonGyro Oct 03 '14 at 13:04

3 Answers3

2

Let's build this query up from its component parts to see what we can optimize.

Observation: you're fetching the 100 most recent rows from the union of two large product tables.

So, let's start by trying to optimize the subqueries fetching stuff from the product tables. Here is one of them.

              SELECT line_id,assembly_datetime datetime,... 
                FROM table_product_stereos 
               WHERE company_id=#

But look, you only need the 100 newest entries here. So, let's add

               ORDER BY assembly_datetime DESC
               LIMIT 100

to this query. Also, you should put a compound index on this table as follows. This will allow both the WHERE and ORDER BY lookups to be satisfied by the index.

 CREATE INDEX id_date ON table_product_stereos (company_id, assembly_datetime)

All the same considerations apply to the query from table_product_televisions. Order it by the time, limit it to 100, and index it.

If you need to apply other selection criteria, you can put them in these inner queries. For example, in a comment you mentioned a selection based on a substring search. You could do this as follows

              SELECT t.line_id,t.assembly_datetime datetime,... 
                FROM table_product_stereos AS t
                JOIN table_manufacture_line AS m   ON m.line_id = t.line_id 
                                                  AND m.line_name LIKE '%test'
               WHERE company_id=#
               ORDER BY assembly_datetime DESC
               LIMIT 100

Next, you are using UNION to combine those two query result sets into one. UNION has the function of eliminating duplicates, which is time-consuming. (You know you don't have duplicates, but MySQL doesn't.) Use UNION ALL instead.

Putting this all together, the innermost sub query becomes this. We have to wrap up the subqueries because SQL is confused by UNION and ORDER BY clauses at the same query level.

           SELECT * FROM (
              SELECT line_id,assembly_datetime datetime,... 
                FROM table_product_stereos 
               WHERE company_id=#
               ORDER BY assembly_datetime DESC 
               LIMIT 100
                         ) AS st
           UNION ALL 
           SELECT * FROM (
             SELECT line_id,assembly_datetime datetime,... 
               FROM table_product_televisions 
              WHERE company_id=#
              ORDER BY assembly_datetime DESC 
              LIMIT 100
                         ) AS tv

That gets you 200 rows. It should get those rows fairly quickly.

200 rows are guaranteed to be enough to give you the 100 most recent items later on after you do your outer ORDER BY ... LIMIT operation. But that operation only has to crunch 200 rows, not 100K+, so it will be far faster.

Finally wrap up this query in your outer query material. Join the table_manufacture_line information, and fix up the timezone.

If you do the indexing and the ORDER BY ... LIMIT operation earlier, this query should become very fast.

The comment dialog in your question indicates to me that you may have multiple product types, not just two, and that you have complex selection criteria for your paged display. Using UNION ALL on large numbers of rows slams performance: it converts multiple indexed tables into an internal list of rows that simply can't be searched efficiently.

You really should consider putting your two kinds of product data in a single table instead of having to UNION ALL multiple product tables. The setup you have now is inflexible and won't scale up easily. If you structure your schema with a master product table and perhaps some attribute tables for product-specific information, you will find yourself much happier two years from now. Seriously. Please consider making the change.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • I also need to restrict the query by line_name from a search string using "WHERE line_name LIKE '%test'" and return the line_name to display. Would adding a INNER JOIN on table_manufacture_line before the LIMIT be doable? Also, I must return results based on paging, so LIMIT won't completely work for me until after the UNION ALL, correct? – DemonGyro Oct 03 '14 at 13:09
  • Yipes! `LIKE '%string'` is the very slowest query you can make! – O. Jones Oct 03 '14 at 15:01
  • Unfortunately, its a requirement. I'm working on a split query atm where I get the line_id and line_name fields in a separate query (having to only compare against 3-20 lines max), then pass in the line_id values into the UNION ALL query, doing a `WHERE line_id IN ()`. I'll do post-processing of the data in PHP to set the line_name for the entries, which has a max of 100 entries per page. – DemonGyro Oct 03 '14 at 15:13
1

Remember: Index fast, data slow. Use joins over nested queries. Nested queries return all of the data fields whereas joins just consider the filters (which should all be indexed - make sure there's a unique index on table_product_*.line_id). It's been a while but I'm pretty sure you can join "ON company_id=#" which should cut down the results early on.

In this case, all of the results refer to the same company (or a much smaller subset) so it makes sense to run that query separately (and it makes the query more maintainable).

So your data source would be:

(table_product_stereos as prod
INNER JOIN table_manufacture_line AS ml ON prod.line_id = ml.line_id and prod.company_id=#
UNION
table_product_televisions as prod
INNER JOIN table_manufacture_line as ml on prod.line_id = ml.line_id and prod.company_id=#)

From which you can select prod. or ml. fields as required.

Mike
  • 2,721
  • 1
  • 15
  • 20
  • I also need to handle filtering and paging. The paging is just ORDER BY and LIMIT at the end of the query. Filtering is on line_name using a LIKE comparison against an inputted string. Where would you put this? – DemonGyro Oct 03 '14 at 13:13
  • Think of your queries in parts. This is the source so: select {fields} from {source} {filter} Source in this case is the bracketed expression above, outside the brackets you can add your extra conditions and order. – Mike Oct 03 '14 at 21:11
0

PHP is not a solution at all... Redis can be a solution.

But the main thing I would change is the index creation for the tables (add missing indexe)...If you're running into temp tables you didn't create indexes well for the tables. And 100k rows in not much at all.

But I cant help you without any table creation statements as well as queries you run.

Make sure your "where part" is part of youf btree index from left to right.

Dewstar
  • 211
  • 1
  • 10