4

I have this dataset:

shopID    supplier    supply_count
1         a           12
2         b           12
2         f           12
2         e           10
3         b           5
3         a           2
4         f           15
4         c           11

I have used this query to get 2nd highest value for each group:

select s1.shopId, max(s1.supply_count)
from supply s1
where supply_count NOT IN (
   select max(supply_count) 
   from supply s2
   where s1.shopId = s2.shopId
)
group by s1.shopId

The results I'm getting are:

shopID   supply_count
2        10
3        2
4        11

The desired output is: (in case there is no second highest I want to present the highest)

shopID   supply_count
1        12
2        10
3        2
4        11

Based on this question: Retrieve 2nd highest count by each group

Amirhossein
  • 1,148
  • 3
  • 15
  • 34
XYZ
  • 71
  • 1
  • 6

2 Answers2

3

You can get desired output to include the value where each shopId has only a single row by also counting the rows for each ShopId.

Based on some more varied data and comments I believe the following should give the desired results:

with c as (
    select *,
        case when dense_rank() over(partition by shopid order by supply_count desc)=2 
            or Min(supply_count) over(partition by shopid)
              =Max(supply_count) over(partition by shopid) 
        then 1 else 0 end as valid
    from t
)
select distinct shopId, supply_count
from c
where valid=1

Example DBFiddle with some more varied data

Stu
  • 30,392
  • 6
  • 14
  • 33
  • 1
    Good. This is shorter than the one I was writing. – The Impaler Oct 16 '21 at 12:48
  • I think you should use ```dense_rank()``` rather than a ```row_number()``` for your query to work. – mani bharataraju Oct 16 '21 at 12:48
  • @manibharataraju why? Dense_rank() could duplicate rows, so that's for the OP to choose - my understanding is 1 row per ShopId – Stu Oct 16 '21 at 12:53
  • @Stu - If you look at the input you can see that there are two rows of 12 for shopid 2, so based on your query, it will give 12 as the output whereas the answer should be 10. You can add a ```group by``` after valid=1 to prevent multiple rows – mani bharataraju Oct 16 '21 at 13:00
  • Yes actually with some more varied test data I will revise slightly. – Stu Oct 16 '21 at 13:09
  • I get error: 'c' is not a recognized GROUP BY option. – XYZ Oct 16 '21 at 19:31
  • @XYZ that's odd there is no group by in this query - can you check what you have with the DBFiddle linked above? – Stu Oct 16 '21 at 19:34
  • added ; at the beginning of the query and the issue was solved. – XYZ Oct 16 '21 at 19:45
  • You're probably using SQL Server? `;` is a row *terminator*, it should terminate the previous statement; it's mostly optional currently in SQL Server but a few statements such as `with` require the previous statement is terminated. – Stu Oct 16 '21 at 19:47
1
select 
shopID,
supply_count
from
(select shopID,
supply_count, dense_rank() over(partition by shopID order by supply_count desc) as rn
from supply) a
where rnk=2
group by
shopID,
supply_count
union
select shopID,
supply_count
from supply
where shopID in (select shopID from supply group by shopID having count(distinct supply_count) =1) a
group shopID,
supply_count;

I believe this should work. What I have tried here is to identify the second highest supply count by assigning a rank for each group and doing a union with the shopIDs that have only one record.

If you don't understand what a dense_rank() does, read this : https://towardsdatascience.com/how-to-use-sql-rank-and-dense-rank-functions-7c3ebf84b4e8

mani bharataraju
  • 162
  • 1
  • 21