1

I'm trying to write a query that joins 2 tables and will give me the top 5 names and the amount of items they sold at a location between a certain date range i.e. 01-01-2016 through 12-31-2017.

From what I've been researching, This is what I came up with:

SELECT
    EmployeeName, 
    COUNT(ID) AS 'Number of Deals', 
    CompanyNumber
FROM
(
    SELECT
        EmployeeName, 
        DealID, 
        CompanyNumber,
        ROW_NUMBER() OVER (PARTITION BY CompanyNumber ORDER BY DealID) AS rn
    FROM Deal
    JOIN DealEmployee
        ON Deal.DealID. =DealEmployee.DealID AS T
    WHERE
        Deal.Status = 2 AND
        Date BETWEEN '2016-01-01' AND '2017-12-31' AND
        EmployeeName != '' AND T.rn <=5 

I am hoping to get the the below result:

enter image description here

I am quite the novice, and I know my syntax is incorrect. Any help given would be greatly appreciated

Bmorewize
  • 39
  • 8

3 Answers3

2

not tested, but I would try something like:

with
basedata as (    
select EmployeeName
     , CompanyNumber
     , COUNT(ID) as Number_of_Deals
  from Deal
  join DealEmployee
    on Deal.DealID = DealEmployee.DealID    
 where Deal.Status = 2
   and Date between '2016-01-01' and '2017-12-31'
   and EmployeeName !=''
 group by EmployeeName
        , CompanyNumber 
)
,
basedata_with_rank as (
select t.*
     , row_number() over (partition by CompanyNumber order by Number_of_Deals desc) rn
  from basedata
)
select *
  from basedata_with_rank
 where rn <= 5
 order by CompanyNumber
        , Number_of_Deals desc

Using CTEs makes queries usually more readable. By the way: I would avoid to give a column the name "date" (a reserved word) and I would always use qualified names for my columns. Perhaps the use of the analytic functions rank or dense rank would be more appropriate, but row_number should also work.

Martin Preiss
  • 396
  • 3
  • 9
  • Thank you! I was able to get my desired result, however I'm having a little difficulty understanding the 2nd half of the query. But it works like a charm. – Bmorewize Aug 09 '18 at 15:46
  • 1
    the first CTE basedata includes the join and your conditions and aggregates the result. The second CTE basedata_with_rank adds the ranking based on the aggregated result from basedata. And the main query filters the top 5 elements based on the ranking from the second CTE. – Martin Preiss Aug 09 '18 at 18:25
1

You need to filter on the row number outside the subquery, after it has been calculated/materialized. I think this is what you want:

SELECT EmployeeName, CompanyNumber, cnt
FROM
(
    SELECT EmployeeName, CompanyNumber, COUNT(*) AS cnt,
        ROW_NUMBER() OVER (PARTITION BY CompanyNumber ORDER BY COUNT(*) DESC) rn
    FROM Deal d
    INNER JOIN DealEmployee de
        ON d.DealID = de.DealID
    WHERE d.Status = 2 AND Date BETWEEN '2016-01-01' AND '2017-12-31' AND
          EmployeeName != ''
    GROUP BY EmployeeName, CompanyNumber
) t
WHERE rn <= 5
ORDER BY
    CompanyNumber,
    cnt DESC;

Note that we perform the GROUP BY aggregation inside the subquery, where we also compute the row number.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Just to note that this won't work; you can't `ORDER BY cnt` as you will just get an error of `Invalid column name 'cnt'`. You actually have to do this in two steps, as shown in the answer by Martin. – Richard Hansell Aug 09 '18 at 15:23
  • @RichardHansell Um...I have no idea what you are talking about. In the outer query, SQL Server should view `cnt` as any other column AFAIK. You might be correct that if we tried to `ORDER BY cnt` in the subquery, it would fail perhaps if SQL Server does not allow aliases to be used in the `ORDER BY` clause. – Tim Biegeleisen Aug 09 '18 at 15:25
  • This is indeed the case, in the outer query it's fine, in the inner query you can't use an alias you just defined in your windowed function `ROW_NUMBER()`. Now the behaviour might vary by vendor, so although this throws an error in SQL Server it might be okay in Oracle? Just to note this DOES throw an error in SQL Server, as I threw together a quick test with some temporary tables to make sure I wasn't talking garbage. – Richard Hansell Aug 09 '18 at 15:29
  • Rats...you're right...you should have mentioned the problem is with `ROW_NUMBER`. I didn't/couldn't really test this query, because it's complex, the OP gave no sample data or demo, and it's late, and I can't be bothered. No, you're certainly not talking garbage. – Tim Biegeleisen Aug 09 '18 at 15:30
  • Well that's an extremely honest answer ;P Your edit works fine though, and to be honest I learned something today, as I didn't know you could use a `COUNT(*)` inside a `ROWNUMBER()` and honestly thought this would have to be done in two stages, as I originally said. So it's all good :D – Richard Hansell Aug 09 '18 at 15:36
  • Yeah...I've written queries like this for SQL Server at work before, so I was pretty sure there would be no problems. Thanks for the feedback. – Tim Biegeleisen Aug 09 '18 at 15:38
0

I think you want this:

SELECT EmployeeName, num_deals, CompanyNumber
FROM (SELECT EmployeeName, CompanyNumber, COUNT(*) as num_deals,
             ROW_NUMBER() OVER (PARTITION BY CompanyNumber ORDER BY COUNT(*) DESC) AS seqnum
      FROM Deal d JOIN
           DealEmployee de
           ON Deal.DealID. = de.DealID 
      WHERE d.Status = 2 AND
            Date BETWEEN '2016-01-01' AND '2017-12-31' AND
            EmployeeName <> ''
      GROUP BY employee, CompanyNumber
     ) t
WHERE seqnum <= 5;

Notes:

  • You need to aggregate before doing the row_number(), because you want to enumerate based on the count. (This version combines the aggregation and row_number() into a single subquery; the aggregation goes first.)
  • The number of deals is aggregated by employee and company.
  • When you have multiple tables, qualify all column names.
  • Your query has a bunch of other errors, such as misplaced table aliases and some lines are missing.
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786