1

Howdie do,

I have the following 3 tables: order, manifest and tracking_updates. Now, each order has foreign key called manifest_id that references the manifest table. Several orders can be in a manifest. The tracking_updates table has a foreign key called order_id that references the order table.

Now, the manifest table contains a column named upload_date. That column, upload_date is the column I need to use in order to determine if an order was uploaded in the last 30 days.

The tracking_update table can contain many updates for each order and so, I must return the most recent tracking update status for each order that matches the criteria below:

1. orders < 30 days, any delivery status 
2. orders > 30 days, not delivered

Please see tables below

**Order**

ID |  manifest_id

1  |  123

2  |  123

3  |  456

**Manifest**:

ID |  upload_date

123  |  2015-12-15 09:31:12

456  |  2015-10-13 09:31:12

**Tracking Update**:

order_id  | status_type | last_updated

1         |  M          | 2015-12-15 00:00:00

1         |  I          | 2015-12-16 07:20:00

1         |  D          | 2015-12-17 15:20:00

2         |  M          | 2015-12-15 00:00:00

2         |  D          | 2015-12-16 15:20:00

3         |  M          | 2015-10-13 00:00:00

3         |  I          | 2015-10-14 12:00:00

3         |  E          | 2015-10-15 13:50:00

This is what the result set would look like for the orders above

**Result Set**

order_id  | manifest_id | latest_tracking_update_status

1         | 123         | D

2         | 123         | D

3         | 456         | E

As you can see, order 1, 2 are assigned to manifest 123 and the manifest was uploaded within the last 30 days and their latest tracking update shows a 'D' for delivered. So those two orders should be included in the result set.

The order 3 is older then 30 days, but hasn't been delivered based off the latest tracking_update status_type, so it should show up in the result set.

Now, the tracking_update table as well over 1 million updates across all orders. So I'm really going for efficiency here

Currently, I have the following queries.

Query #1 returns orders that have been uploaded within the last 30 days and their corresponding latest tracking update

SELECT 
fgw247.order.id as order_id,
    (SELECT 
        status_type
    FROM 
        tracking_update as tu
    WHERE 
        tu.order_id = order_id
    ORDER BY 
        tu.ship_update_date DESC
    LIMIT 
        1
    ) as latestTrackingUpdate
FROM 
    fgw247.order, manifest
WHERE
    fgw247.order.manifest_id = manifest.id 
AND
    upload_date >= '2015-12-12 00:00:00'

Query #2 returns the order_id and latest tracking update for every order in the tracking_update table:

SELECT tracking_update.order_id,
    substring_index(group_concat(tracking_update.status_type order by tracking_update.last_updated), ',', -1)
FROM 
    tracking_update
WHERE 
    tracking_update.order_id is not NULL
GROUP BY tracking_update.order_id

I'm just not sure how to combine these queries to get my orders that match the criteria:

  1. orders < 30 days, any delivery status
  2. orders > 30 days, not delivered

Any ideas would be GREATLY appreciated.

* UPDATE *

This is the current query thanks to answer selected:

select 
    o.id, t.maxudate, tu.status_type, m.upload_date
from 
    (select order_id, max(last_updated) as maxudate from tracking_update group by order_id) t
inner join 
    tracking_update tu on t.order_id=tu.order_id and t.maxudate=tu.last_updated
right join 
    fgw247.order o on t.order_id=o.id
left join 
    manifest m on o.manifest_id=m.id
where 
    (tu.status_type != 'D' and tu.status_type != 'XD' and m.upload_date <='2015-12-12 00:00:00') or m.upload_date >= '2015-12-12 00:00:00'
LIMIT 10

UPDATE

This is the current query that joins the three tables rather efficiently

SELECT 
    o.*, tu.*
FROM
    fgw247.`order` o
JOIN
    manifest m
ON
    o.`manifest_id` = m.`id` 

JOIN
    `tracking_update` tu
ON
    tu.`order_id` = o.`id` and tu.`ship_update_date` = (select max(last_updated) as last_updated from tracking_update where order_id = o.`id` group by order_id)
WHERE
    m.`upload_date` >= '2015-12-14 11:50:12' 
    OR 
        (o.`delivery_date` IS NULL AND m.`upload_date` < '2015-12-14 11:50:12')
LIMIT 100
Jimmy
  • 887
  • 1
  • 10
  • 24
  • What you can do is Join All the tables on their respective condtions like ID, ManifestID and OrderID and after that select Upload_Date and Last_updated as result too and put in some temporary table. After that you can check the difference of Dates using Condition statements Like Select Case [Last_updated -Upload_Date ] >30 then D else P – superB Jan 09 '16 at 20:02
  • Wouldn't this exceed the max join size? The tracking update table is well over 1M records. – Jimmy Jan 09 '16 at 20:08

2 Answers2

1

Have a subquery that returns the latest update date from the tracking table for each order. Join this subquery on the tracking, orders, and manifests tables to get the details and filter based on the upload date in the where clause:

select o.order_id, t.maxudate, tu.status_type, m.upload_date
from (select order_id, max(update_date) as maxudate from tracking_update group by order_id) t
inner join tracking_update tu on t.order_id=tu.order_id and t.maxudate=tu.update_date
right join orders o on t.order_id=o.order_id
left join manifests m on o.manifest_id=m.manifest_id
where (tu.status_type<>'D' and curdate()-m.upload_date>30) or curdate()-m.upload_date<=30

It may be more efficient to use a union query instead of the or criteria in the where clause.

Shadow
  • 33,525
  • 10
  • 51
  • 64
  • While I agree with the concept, this actually doesn't run – Jimmy Jan 09 '16 at 20:41
  • Pls provide some meaningful feedback, such as exact query you ran, the error message you received or unexpected behaviour you encountered. – Shadow Jan 09 '16 at 21:10
  • I receive a syntax error after m.upload_date. So I placed a comma, but then I receive a syntax error on the inner join – Jimmy Jan 09 '16 at 21:12
  • I missed the from keyword before the parentheses, I update the answer – Shadow Jan 09 '16 at 21:22
  • Thank you, but I'm now receiving: Error Code: 1054. Unknown column 'o.order_id' in 'field list'. I've updated my question to show the query I've run – Jimmy Jan 09 '16 at 21:38
  • Pls be a bit independent and interpret the error message, do not just copy it back bere! I may have used a different column name than you have in your tables. – Shadow Jan 09 '16 at 21:43
  • Thank you and u were correct. I forgot to change a field. The only issue I have with this is that it takes about 3.8 seconds to return 10 rows. Please see my updated query above – Jimmy Jan 10 '16 at 00:29
  • 1. Watch out, your date criteria in where are overlapping. 2. As I indicated at the end of my answer, consider using a union query instead of the or condition in where. 3. I would post the optimisation as a separate question with create table statements for all affected tables so we can see the existing indexes, and the explain for the select query. – Shadow Jan 10 '16 at 00:35
  • Thanks for your input. I will post a different question for optimization. As for the union, not sure how you would place this in the or condition as union is supposed to be used with multiple select statements – Jimmy Jan 11 '16 at 14:59
  • I've posted a second question for optimization per your request: http://stackoverflow.com/questions/34725091/how-to-optimize-the-sql-query-that-joins-3-table – Jimmy Jan 11 '16 at 15:24
0

You can perform a JOIN with the 2nd query result like

SELECT 
fgw247.order.id as order_id,
xx.some_column,
    (SELECT 
        status_type
    FROM 
        tracking_update as tu
    WHERE tu.order_id = order_id
    ORDER BY 
        tu.ship_update_date DESC
    LIMIT 
        1
    ) as latestTrackingUpdate
FROM 
    fgw247.order JOIN manifest
ON fgw247.order.manifest_id = manifest.id 
JOIN (
SELECT tracking_update.order_id,
    substring_index(group_concat(tracking_update.status_type order by tracking_update.last_updated), ',', -1) AS some_column
FROM 
    tracking_update
WHERE 
    tracking_update.order_id is not NULL
GROUP BY tracking_update.order_id ) xx ON xx.order_id = fgw247.order.id

WHERE upload_date >= '2015-12-12 00:00:00'
Rahul
  • 76,197
  • 13
  • 71
  • 125
  • *I don't think that ...* and why so? – Rahul Jan 09 '16 at 20:21
  • It runs a subquery for each order id. – Shadow Jan 09 '16 at 21:11
  • Thank you for your reply. I removed the subquery from the initial select as it's redundant when you join with the second query. Yes, this does take care of joining the two tables, but what about the orders that are not delivered over 30 days? – Jimmy Jan 09 '16 at 21:28