5

I have two tables in MySQL sales database:

Orders table:

CREATE TABLE salestest.`orders` (  
`ID` int(11) unsigned NOT NULL auto_increment,  
`OrderDate` datetime NOT NULL,  
`CustomerID` int(11) unsigned NOT NULL,  
PRIMARY KEY (`ID`),  
UNIQUE KEY `ID` (`ID`),  
KEY `OrderDate` (`OrderDate`),  
KEY `CustomerID` (`CustomerID`)  
) ENGINE=InnoDB;  

INSERT INTO salestest.orders VALUES  
( 1, '2012-04-15', 1 ),  
( 2, '2012-05-20', 1 ),  
( 3, '2012-06-30', 1 );  

OrderDetails table:

CREATE TABLE salestest.`OrderDetails` (  
`ID` int(11) unsigned NOT NULL auto_increment,  
`OrderID` int(11) unsigned NOT NULL,  
`ProductID` int(11) unsigned NOT NULL,  
`Price` double NOT NULL default '0',  
PRIMARY KEY  (`ID`),  
UNIQUE KEY `ID` (`ID`),  
KEY `OrderID` (`OrderID`),  
KEY `ProductID` (`ProductID`),  
CONSTRAINT `OrderID_fk` FOREIGN KEY (`OrderID`) REFERENCES `orders` (`ID`)  
) ENGINE=InnoDB;  

INSERT INTO salestest.OrderDetails VALUES  
( 1, 1, 1, 2 ),  
( 2, 1, 2, 15 ),  
( 3, 1, 3, 22 ),  
( 4, 2, 1, 3 ),  
( 5, 2, 2, 17 ),  
( 6, 2, 3, 23 ),  
( 7, 2, 4, 40 ),  
( 8, 3, 1, 4 ),  
( 9, 3, 2, 20 );  

Now I need to select for each customer, the last price they purchase each product.

The easy way to do it is by using a subquery:

SELECT od2.CustomerID,od2.ProductID, od2.Price AS LastPrice, od2.OrderDate AS LastDate  
FROM (SELECT o1.ID, o1.CustomerID, o1.OrderDate, od1.ProductID, od1.Price  
  FROM orders AS o1  
  LEFT JOIN OrderDetails as od1 ON o1.ID=od1.OrderID  
  ORDER BY OrderDate DESC) AS od2  
GROUP BY CustomerID, ProductID  
ORDER BY CustomerID, ProductID;  

Result:

CustomerID ProductID LastPrice LastDate
1 1 4 2012-06-30 00:00:00
1 2 20 2012-06-30 00:00:00
1 3 23 2012-05-20 00:00:00
1 4 40 2012-05-20 00:00:00

Now the question; how is it possible to get the same result if I want to avoid sub-query, temp tables or a view, I only want to use joins; this query is a small part of a much larger query, and having sub-query is highly inefficient.

I tried this query:

SELECT o1.CustomerID,od1.ProductID, od1.Price AS LastPrice, o1.OrderDate AS LastDate
FROM Orders AS o1 LEFT JOIN OrderDetails as od1 ON o1.ID=od1.OrderID
GROUP BY CustomerID, ProductID
ORDER BY CustomerID, ProductID;

but it gives a different result:

CustomerID ProductID LastPrice LastDate
1 1 2 2012-04-15 00:00:00
1 2 15 2012-04-15 00:00:00
1 3 22 2012-04-15 00:00:00
1 4 40 2012-05-20 00:00:00

As you see, LastPrice & LastDate are not correct. I also tried Allen's suggestion, but the result is:

CustomerID ProductID LastPrice LastDate
1 1 4 2012-06-30 00:00:00
1 2 20 2012-06-30 00:00:00

first query from spencer's answer results duplicated products:

CustomerID ProductID LastPrice LastDate
1 3 22 2012-04-15 00:00:00
1 3 23 2012-05-20 00:00:00
1 4 40 2012-05-20 00:00:00
1 1 4 2012-06-30 00:00:00
1 2 20 2012-06-30 00:00:00

other answers all use sub-query, which I am trying to avoid.
any suggestions?

4 Answers4

1

Look for "greatest-n-per-group"

It's the greatest thing I've ever learned in SQL, I hope you love it too.

OK, here's my stab at it:

SELECT o.CustomerID, od.ProductID, od.Price AS LastPrice, o.OrderDate AS LastDate  
FROM OrderDetails od
LEFT JOIN orders as o ON od.OrderID = o.ID
LEFT JOIN orders as o2 ON o.CustomerID = o2.CustomerID AND o.id < o2.id
WHERE o2.id IS NULL
ORDER BY o.CustomerID, od.ProductID;

You want to know, by customer + product, what the last time that customer bought each product and what they paid for it.

So I started with the product, joined the orders (first join), then joined orders again so that I can limit the query to a single order per customer + product (o2 matches all orders up to, but not including the most recent order). We then use the fact that o2 doesn't match the most recent order to only select that one row.

This assumes that you won't have the same item twice in one order with different pricing and that newer orders will always have a higher ID.

Hopefully this gets you close enough that your real data/query can modify as needed - Good luck!

Community
  • 1
  • 1
Allen Holman
  • 125
  • 7
  • Thanks Allen, the only problem in your solution is when customer buy products 1,2,3,4 today, then he buy products 1,2 tomorrow. in this case your query will return information about products 1,2 only, but nothing about 3,4. – user1499268 Jul 03 '12 at 20:54
  • Try this data for testing: INSERT INTO salestest.orders VALUES ( 1, '2012-04-15', 1 ), ( 2, '2012-05-20', 1 ), ( 3, '2012-06-30', 1 ); INSERT INTO salestest.orderDetails VALUES ( 1, 1, 1, 2 ), ( 2, 1, 2, 15 ), ( 3, 1, 3, 22 ), ( 4, 2, 1, 3 ), ( 5, 2, 2, 17 ), ( 6, 2, 3, 23 ), ( 7, 2, 4, 40 ), ( 8, 3, 1, 4 ), ( 9, 3, 2, 20 ); – user1499268 Jul 03 '12 at 21:02
0
select B.*,A.Price from
(select CustomerID,ProductID,OrderDate,Price from Orders o join OrderDetails od on o.ID=od.OrderID) A 
join 
(select CustomerID,ProductID,max(OrderDate) as LastOrderDate 
from Orders o 
join OrderDetails od on o.ID=od.OrderID
group by CustomerID,ProductID) B 
on A.CustomerID=B.CustomerID and A.ProductID=B.ProductID and A.OrderDate=B.LastOrderDate
Dojo
  • 5,374
  • 4
  • 49
  • 79
  • is there anyway to do it without sub-query? – user1499268 Jul 03 '12 at 20:58
  • The two queries A & B are evaluated independently and the outer select happens on the join of A & B which is small. Its not like a loop-inside-a-loop kind of query that you should be worried about. – Dojo Jul 04 '12 at 06:55
0

You may find this more efficient:

select opl.CustomerID,
    opl.ProductID,
    opl.LastDate,
    od.Price
from (
    select o.CustomerID,
        od.ProductID,
        max(o.OrderDate) as LastDate
    from Orders o
    inner join OrderDetails od on o.ID = od.OrderID
    group by o.CustomerID, od.ProductID
) opl
inner join Orders o on opl.CustomerID = o.CustomerID
    and opl.LastDate = o.OrderDate
inner join OrderDetails od on o.ID = od.OrderID
D'Arcy Rittich
  • 167,292
  • 40
  • 290
  • 283
0

UPDATE:

I have been unable to reproduce the result set using only joins (without using an inline view or a correlated subquery).

I don't think it's possible to reliably return the specified result set wihtout using an inline view or a correlated subquery.


This returns the specified result, but with no inlineview and no subqueries. (But that's not to say that this is going to be the "fastest" query that returns the result set.

NOT WORKING... please standby

SELECT o1.CustomerID, d1.ProductID, d1.Price, o1.Orderdate
  FROM orders o1
  JOIN OrderDetails d1 ON d1.OrderID = o1.ID
  LEFT      
  JOIN orders o2 
    ON o1.CustomerID = o2.CustomerID
       AND o1.ID <> o2.ID
       AND (o1.OrderDate < o2.OrderDate
           OR (o1.OrderDate = o2.OrderDate AND o1.ID < o2.ID)
           )
  LEFT
  JOIN OrderDetails d2
    ON d2.OrderID = o2.ID
       AND d2.ProductID = d1.ProductId
       AND (o1.OrderDate < o2.OrderDate
           OR (o1.OrderDate = o2.OrderDate AND o1.ID < o2.ID)
           OR (o1.OrderDate = o2.OrderDate AND o1.ID = o2.ID AND d1.ID < d2.ID )
           )
 WHERE d2.ID IS NULL 

This query joins the tables to themselves and filters out the "topmost" row for each group.

--

Conceptually, that query is the same as the following query. The following query makes use of an "inline view" (aliased as a and b). The purpose of the inline view is really just to get CustomerID and OrderDate associated with each OrderDetail line.

 SELECT a.CustomerID, a.ProductID, a.Price, a.Orderdate
   FROM (SELECT o1.CustomerID, d1.ProductID, d1.Price, o1.OrderDate, d1.OrderID, d1.ID
           FROM orders o1
           JOIN OrderDetails d1 ON d1.OrderID = o1.ID
        ) a
   LEFT      
   JOIN (SELECT o2.CustomerID, d2.ProductID, d2.Price, o2.OrderDate, d2.OrderID, d2.ID
           FROM orders o2
           JOIN OrderDetails d2 ON d2.OrderID = o2.ID
        ) b
     ON a.CustomerID = b.CustomerID
        AND a.ProductID = b.ProductId
        AND a.OrderID <> b.OrderID
        AND a.ID <> b.ID
        AND (a.OrderDate < b.OrderDate 
             OR (a.OrderDate = b.OrderDate AND a.OrderID < b.OrderID)
             OR (a.OrderDate = b.OrderDate AND a.OrderID = b.OrderID AND a.ID < b.ID))
  WHERE b.ID IS NULL 

We'd use a common table expression (CTE) in place of the inline views, if MySQL supported them.


Finally, here's an entirely different approach, which uses MySQL "user variables" to simulate the analytic functions that are missing from MySQL.

SELECT q.CustomerID
     , q.ProductID
     , q.Price
     , q.OrderDate
  FROM (SELECT IF(p.CustomerID = @last_customerid,IF(p.ProductID = @last_productid,0,1),1) AS break
             , @last_customerid := p.CustomerID AS CustomerID
             , @last_productid := p.ProductID AS ProductID
             , p.Price
             , p.OrderDate
         FROM (SELECT @last_customerid := NULL, @last_productid := NULL) i
         JOIN ( SELECT o.CustomerID, d.ProductID, o.OrderDate, d.Price 
                  FROM orders o
                  JOIN OrderDetails d ON d.OrderID = o.ID
                 ORDER BY o.CustomerID, d.ProductID, o.OrderDate DESC
              ) p
       ) q
  WHERE q.break = 1

spencer7593
  • 106,611
  • 15
  • 112
  • 140
  • very structured, and very clean code, thanks! Is it possible to rewrite it to do the same without any sub-query? – user1499268 Jul 03 '12 at 21:00
  • the first query is showing duplicated customer/product in the result, not only the last purchase. u can test on the new data i posted – user1499268 Jul 03 '12 at 21:10
  • @user: I'm working with the new data you posted. The second query in my answer (which has the inline views) seems to be returning the correct result set. With the new data, I see that the first query in my answer is returning an incorrect result set, I'm trying to find what's wrong with it. That's the last one I wrote, as an attempt to rewrite the query using the inline views (basically to get the CustomerID and the OrderDate down at the OrderDetails level.) – spencer7593 Jul 03 '12 at 22:58
  • In theory, that first query was supposed to be equivalent to the second query. With the original test data, it "accidentally" returned an equivalent result set. To quote the ever-quotable philosopher Yogi Berra... "In theory, there is no difference between theory and practice. In practice, there is." – spencer7593 Jul 03 '12 at 23:04
  • thanks, i reach to the same conclusion, there is a limit for everything after all, including joins! – user1499268 Jul 06 '12 at 07:11