1

I am trying to improve a query which does the following:

For every job, add up all the costs, add up the invoiced amount, and calculate a profit/loss. The costs come from several different tables, e.g. purchaseorders, users_events (engineer allocated time/time he spent on site), stock used etc.
The query also needs to output some other columns like the name of the site for the work, so that that column can be sorted by (an ORDER BY is appended after all of this).

SELECT
    jobs.job_id,
    jobs.start_date,
    jobs.end_date,
    events.time,
    sites.name site,
    IFNULL(stock_cost,0) stock_cost,
    labour,
    materials,
    labour+materials+plant+expenses revenue,
    (labour+materials+plant)-(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)) profit,
    ((labour+materials+plant)-(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)))/(time*3557/360000+IFNULL(orders_cost,0)+IFNULL(stock_cost,0)) ratio

FROM
    jobs

    LEFT JOIN (
        SELECT
            job_id,
            SUM(labour_charge) labour,
            SUM(materials_charge) materials,
            SUM(plant_hire_charge) plant,
            SUM(expenses) expenses
        FROM invoices
        GROUP BY job_id
        ORDER BY NULL
    ) invoices USING(job_id)

    LEFT JOIN (
        SELECT
            job_id,
            SUM(IF(start_onsite && end_onsite,end_onsite-start_onsite,end-start)) time,
            SUM(travel+parking+materials) user_expenses
        FROM users_events
        WHERE type='job'
        GROUP BY job_id
        ORDER BY NULL
    ) events USING(job_id)

    LEFT JOIN (
        SELECT
            job_id,
            SUM(IFNULL(total,0))*0.01 orders_cost
        FROM purchaseorders
        GROUP BY job_id
        ORDER BY NULL
    ) purchaseorders USING(job_id)

    LEFT JOIN (
        SELECT
            location job_id,
            SUM(amount*cost))*0.01 stock_cost
        FROM stock_location
        LEFT JOIN stock_items ON stock_items.id=stock_location.stock_id
        WHERE location>=3000 AND amount>0 AND cost>0
        GROUP BY location
        ORDER BY NULL
    ) stock USING(job_id)

    LEFT JOIN contacts_sites sites ON sites.id=jobs.site_id;

I read this: http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html but don't see how/if I can apply anything therein. For testing purposes, I have tried adding all sorts of indices on fields left, right and centre with no improvement to the EXPLAIN output:

+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+
| id | select_type | table          | type   | possible_keys          | key     | key_len | ref                                | rows  | Extra                         |
+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+
|  1 | PRIMARY     | jobs           | ALL    | NULL                   | NULL    | NULL    | NULL                               |  7088 |                               |
|  1 | PRIMARY     | <derived2>     | ALL    | NULL                   | NULL    | NULL    | NULL                               |  5038 |                               |
|  1 | PRIMARY     | <derived3>     | ALL    | NULL                   | NULL    | NULL    | NULL                               |  6476 |                               |
|  1 | PRIMARY     | <derived4>     | ALL    | NULL                   | NULL    | NULL    | NULL                               |   904 |                               |
|  1 | PRIMARY     | <derived5>     | ALL    | NULL                   | NULL    | NULL    | NULL                               |   531 |                               |
|  1 | PRIMARY     | sites          | eq_ref | PRIMARY                | PRIMARY | 4       | bestbee_db.jobs.site_id            |     1 |                               |
|  5 | DERIVED     | stock_location | ALL    | stock,location,amount,…| NULL    | NULL    | NULL                               |  5426 | Using where; Using temporary; |
|  5 | DERIVED     | stock_items    | eq_ref | PRIMARY                | PRIMARY | 4       | bestbee_db.stock_location.stock_id |     1 | Using where                   |
|  4 | DERIVED     | purchaseorders | ALL    | NULL                   | NULL    | NULL    | NULL                               |  1445 | Using temporary;              |
|  3 | DERIVED     | users_events   | ALL    | type,type_job          | NULL    | NULL    | NULL                               | 11295 | Using where; Using temporary; |
|  2 | DERIVED     | invoices       | ALL    | NULL                   | NULL    | NULL    | NULL                               |  5320 | Using temporary;              |
+----+-------------+----------------+--------+------------------------+---------+---------+------------------------------------+-------+-------------------------------+

The rows produced is 5 x 10^21 (down from 3 x 10^42 before I started optimising this query!)
It currently takes seven seconds to execute (down from 26) but I would like that to be under one second.

By the way: GROUP BY x ORDER BY NULL is a great way to eliminate unnecessary filesorts from subqueries! (from http://www.mysqlperformanceblog.com/2006/09/04/group_concat-useful-group-by-extension/)

Nicholas Shanks
  • 10,623
  • 4
  • 56
  • 80
  • @devart That has already been tried, and made no difference to the usage of indices. I just noticed that the EXPLAIN output is from before I added ORDER BY NULL (which I did half-way through writing the question). Sorry for that. I will replace it with the one from the posted query (possible_keys truncated again for brevity) – Nicholas Shanks Jun 21 '11 at 12:33
  • You are starting with Jobs.. which is ALL jobs, even those that have had no invoicing or expenses incurred which would result in no profit or loss (except research time on the project and time to prepare bid/contract offers). OR... Do you only want those jobs that have AT LEAST had SOMETHING Invoiced... If so, there can be some minor tweaks to adjust the query to optimize it for you. – DRapp Jun 22 '11 at 10:47
  • @drapp Indeed, I was previously using INNER JOINs for the sites, events and invoices tables, on the assumption that my boss would only want to see jobs where I could reliably calculate a profit/loss (had been invoiced), that each job actually had an engineer allocated to do the work, and that each jobs.site_id would be a valid foreign key value. MySQL did then use some of the indices that I had created, but my boss additionally wants to see the costs so far incurred for all jobs in progress, whether or not any engineer time is allocated, leaving me with nothing but LEFT JOINS. – Nicholas Shanks Jun 22 '11 at 11:13

1 Answers1

0

Based on your comment to my question, I would do the following...

At the very top...

SELECT STRAIGHT_JOIN (just add the "STRAIGH_JOIN" keyword)

Then, for each of your subqueries for invoices, events, p/o's, etc, change the ORDER BY to the JOB_ID explicitly so it might help the optimization against the primary JOBS table join.

Finally, ensure each of your subquery tables HAS an index on the Job_ID (Invoices, User_events, PurchaseOrders, Stock_Location)

Additionally, for the Stock_Location table, you might want to help the WHERE clause for your subquery by having a compound index on
(job_id, location, amount) Three fields deep should be enough even though you have the key plus 3 where condition elements.

DRapp
  • 47,638
  • 12
  • 72
  • 142
  • for stock_location, the `location` column is the job_id if it's value is >= 3000, otherwise, it's the warehouse (0) or a van_id (1-2999). I can ignore stock at those latter two types of locations. – Nicholas Shanks Jun 22 '11 at 13:51
  • it will now be no earlier than Monday before I get to work on this bit of code again, so will vote/accept then. – Nicholas Shanks Jun 23 '11 at 08:43
  • Okay, so Monday was over a year ago now, and I still haven't gone back to that bit of code to test your solution. I promise to get to it some day :) – Nicholas Shanks Nov 08 '12 at 12:37