1

I want to get the row per group with the min value of two columns.

I have a table that has listings for items I want, as well as their cost and distance from me.

mytable:
item | cost | dist
-----+------+---------
1    | $2   | 1.0
1    | $3   | 0.5
1    | $4   | 2.0
2    | $2   | 2.0
2    | $2   | 1.5
2    | $2   | 4.0
2    | $8   | 1.0
2    | $12  | 3.0
3    | $1   | 5.0

For each item, I want to get the row that has the min cost, then if there are multiple of the min cost, get the one with the min dist

so my result would be

item | cost | dist
-----+------+---------
1    | $2   | 1.0
2    | $2   | 1.5
3    | $1   | 5.0

I know I can achieve this result using

SELECT * 
, ROW_NUMBER() OVER(PARTITION BY item ORDER BY cost ASC, dist ASC) as [RID]
FROM mytable
WHERE [RID] = 1

but the problem comes when I have 100,000 items each with 100,000 listings, and sorting the whole table becomes incredibly time-consuming.

Since I only need the top 1 of each group, I'm wondering if there is another way to get the result I want without sorting the whole table of 10,000,000,000 entries.

Currently using SQL Server 2012

M. McCrary
  • 35
  • 3

4 Answers4

1

A nice article on this topic is by Itzik Ben Gan - Optimizing TOP N Per Group Queries. This discusses a concatenation approach.

For example if your table is

CREATE TABLE #YourTable
  (
     item INT,
     cost MONEY CHECK (cost >= 0),
     dist DECIMAL(10, 2) CHECK (dist >= 0)
  ) 

you might use

WITH T AS
(
SELECT item,  
       MIN(FORMAT(CAST(cost * 100 AS INT), 'D10') + FORMAT(CAST(dist * 100 AS INT), 'D10')) AS MinConcat
FROM #YourTable
GROUP BY item
)
SELECT item,
       CAST(LEFT(MinConcat,10)/100.0 AS MONEY),
       CAST(RIGHT(MinConcat,10)/100.0 AS  DECIMAL(10,2))
FROM T

So this can be done in a single grouping operation on id (which could be a hash aggregate without any sort).

You need to be careful that the value of the concatenated result has the same ordering when treated as a string as cost, dist would have when treated as raw column values so the query above may need adjusting if your datatypes are different.

It currently reserves the left most 10 characters for cost represented as an integer number of pence and padded with leading zerores, and dist as a 10 digit integer similarly.

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • The article is a great resource and exactly what I was looking for, thanks! It's a shame that the images are no longer available. – M. McCrary Feb 12 '19 at 23:03
0

You can do this way

; with c as 
(select min(cost) as cost, item
from mytable
group by item)
select t.* from mytable t
inner join c
on c.item = t.item and c.cost=t.cost;

However, I'd recommend you to add index to item and cost columns to make the query fast.

[Edit] After re-reading the OP question, it should be like the following when there are ties in cost,

; with c as 
(select min(cost) as cost, item
from mytable
group by item)
, c2 as (
select t.cost, t.item, min(dist) as dist from mytable t
inner join c
on c.item = t.item and c.cost=t.cost
group by t.cost, t.item)
select  t.item,t.cost, c2.dist from mytable t
inner join c2
on c2.item = t.item, and c2.cost = t.cost;

Maybe there are better ways, but this should work.

jyao
  • 1,550
  • 3
  • 19
  • 26
  • This is missing any logic to select the minimum item by dist per id if rows are tied with the same min cost in a group – Martin Smith Feb 12 '19 at 18:08
  • The OP's question is about the minimum cost of item in each group, so if there are rows tied in cost, they should be selected out. If not, we just need to put a `distinct` in the select statement – jyao Feb 12 '19 at 18:12
0

If you have a table of items, then this might work:

select i.*, t.*
from items i cross apply
     (select top (1) t.*
      from t
      where t.item = i.item
      order by cost, dist
     ) t;

For this to be efficient, you need an index on (item, cost, dist).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

Something like this should work:

select
    t.item, MIN(t.cost) as mincost, min(t2.mindist) as mindist
from mytable t
inner join (
select item, cost, MIN(dist) as mindist
    from mytable
    group by
        item, cost
) t2 on t.item = t2.item
group by t.item,t2.cost
having MIN(t.cost) = t2.cost
Chris Hackett
  • 399
  • 1
  • 9