15

I'm using the northwind database to refresh my SQL skills by creating some more or less complex queries. Unfortunately I could not find a solution for my last use case: "Get the sum of the five greatest orders for every category in year 1997."

The tables involved are:

Orders(OrderId, OrderDate)
Order Details(OrderId, ProductId, Quantity, UnitPrice)
Products(ProductId, CategoryId)
Categories(CategoryId, CategoryName)

I have tried the following query

SELECT c.CategoryName, SUM(
  (SELECT TOP 5 od2.UnitPrice*od2.Quantity 
   FROM [Order Details] od2, Products p2
   WHERE od2.ProductID = p2.ProductID
   AND c.CategoryID = p2.CategoryID
   ORDER BY 1 DESC))
FROM [Order Details] od, Products p, Categories c, Orders o 
WHERE od.ProductID = p. ProductID
AND p.CategoryID = c.CategoryID
AND od.OrderID = o.OrderID
AND YEAR(o.OrderDate) = 1997
GROUP BY c.CategoryName

Well... It turned out that subqueries are not allowed in aggregate functions. I've read other posts about this issue but could not find a solution for my specific use case. Hope you can help me out...

Thomas
  • 232
  • 1
  • 2
  • 8
  • Which version of RDBMS are you using? Also your question is not correct because the order can belong in more than one category. – Hamlet Hakobyan May 01 '13 at 12:10
  • I'm using MSSQL 2012. If you refer to the subquery, there is a WHERE clause - `WHERE c.categoryID = p2.CategoryID` - that should filter for one category only. – Thomas May 01 '13 at 12:14

4 Answers4

35

Subqueries are not generally allowed in aggregate functions. Instead, move the aggregate inside the subquery. In this case, you'll need an extra level of subquery because of the top 5:

SELECT c.CategoryName,
  (select sum(val)
   from (SELECT TOP 5 od2.UnitPrice*od2.Quantity as val
         FROM [Order Details] od2, Products p2
         WHERE od2.ProductID = p2.ProductID
         AND c.CategoryID = p2.CategoryID
         ORDER BY 1 DESC
        ) t
  )
FROM [Order Details] od, Products p, Categories c, Orders o 
WHERE od.ProductID = p. ProductID
AND p.CategoryID = c.CategoryID
AND od.OrderID = o.OrderID
AND YEAR(o.OrderDate) = 1997
GROUP BY c.CategoryName, c.CategoryId
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks that did the job! Just for info... the CategoryID needs to be grouped as well. The last line should look like `GROUP BY c.CategoryName, c.CategoryID`. But thanks for your time, that query looks really weird to me. Need to analyze it in depth now... :) – Thomas May 02 '13 at 16:19
4

Use CTE with ROW_NUMBER ranking function instead of excessive subquery.

 ;WITH cte AS
 (
  SELECT c.CategoryName, od2.UnitPrice, od2.Quantity,
         ROW_NUMBER() OVER(PARTITION BY c.CategoryName ORDER BY od2.UnitPrice * od2.Quantity DESC) AS rn
  FROM [Order Details] od JOIN Products p ON od.ProductID = p.ProductID
                          JOIN Categories c ON p.CategoryID = c.CategoryID
                          JOIN Orders o ON od.OrderID = o.OrderID
  WHERE o.OrderDate >= DATEADD(YEAR, DATEDIFF(YEAR, 0, '19970101'), 0)
    AND o.OrderDate < DATEADD(YEAR, DATEDIFF(YEAR, 0, '19970101')+1, 0)
  )
  SELECT CategoryName, SUM(UnitPrice * Quantity) AS val
  FROM cte
  WHERE rn < 6
  GROUP BY CategoryName
Aleksandr Fedorenko
  • 16,594
  • 6
  • 37
  • 44
3

Its definitely a sub query problem here is an excellent article on this (originally written for Access but the syntax is identical), also orderdate = 1997 will give order date for 1 jan 1997' -- you need datepart(year, orderdate) = 1997, once you have the (up to five) rows returned for each category you can then encapsulate the rows returned and agregate them

3Doubloons
  • 2,088
  • 14
  • 26
Ian P
  • 1,724
  • 1
  • 10
  • 12
  • Thanks for the link. You're right I edited my post. It should be ofc `YEAR(o.OrderDate) = '1997'` – Thomas May 01 '13 at 12:22
  • Could you provide an example for "encapsulate the rows returned and aggregate them"? – Thomas May 01 '13 at 12:28
  • SELECT x.A, x.B, x.C, SUM(x.d) AS D FROM ( Any valid sql select statement containing columns a, b , c, d, e) x GROUP BY x.d – Ian P May 01 '13 at 15:07
  • This is also a sub query, however its a very simple one and the top 5 grouped by Category subquery is a more advanced and powerful use. I did not want to confuse by using the same descriptive nown. More important for you to be aware of the full power of sub queries. – Ian P May 01 '13 at 15:16
0

I ran into a very similar problem with an Access subquery where the records were sorted by date. When I used the "Last" aggregate function I found it passed through all of the subqueries and retrieved the last row of data from the Access table, and not the sorted query as intended. Although I could have rewritten the query to use the aggregate function within the first set of parenthesis (as was previously suggested) I found it easier to save query results as a table in the database sorted in the order I wanted and then use the "Last" aggregate function to retrieve the values I wanted. I'll run an update query in the future to keep results current. Not efficient but effective.