4

Based on the following table

Table_A

ID   Rev  Description
-----------------------------------
1    1    Some text.
1    2    Some text. Adding more.
1    3    Some text. Ading more & more.

The above will keep adding a new row when user updates the description.

I want to take the row with MAX(Rev) [i.e. the latest description].

To get this I do the following:

;with AllDescriptions As
(
 select 
        ID
        , Rev
        , Description 
        , ROW_NUMBER() over (partition by ID order by Rev desc) as RowNum
        from Table_A
        Where ID = 1
)
select ID, Rev, Description from AllDescription
where RowNum = 1

Recently I saw a different approach to getting the same result

select b.* from 
(
 select ID, MAX(Rev) as MaxRev 
 from Table_A 
 where ID = 1
 group by ID
) as a
inner join 
(
 select ID, Rev, Description from Table_A where ID = 1
) as b
on a.ID = b.ID and a.MaxRev = b.Rev

From learning perspective, I want to know Which of the above two approaches is better? Or if there is even better way to do the same?

stackoverflowuser
  • 22,212
  • 29
  • 67
  • 92
  • Check the execution plans. Version 2 may exhibit some single-seek-plus-partition optimizer magic. I've seen this example before, don't remember where. *Inside SQL Server 2008: T-SQL Programming*, perhaps? – Peter Radocchia Sep 07 '10 at 23:41
  • @Peter - I seem to recall that they did a comparison against either a JOIN or CROSS APPLY with `TOP 1` in those books I forget the details. – Martin Smith Sep 07 '10 at 23:45
  • I've just tried both - `STATISTICS IO` gives almost identical numbers of reads for both options, with a supporting clustered index on the table. The query plan figures that the ROW_NUMBER variant is much more expensive wrt. 'Query Cost Relative to Batch' - but I think this is misleading in this instance. – Will A Sep 07 '10 at 23:55
  • @Will I've quite lost faith in the `Query Cost Relative to Batch` figures since I found them to be so woefully wrong in this case http://stackoverflow.com/questions/3424650/sql-query-pervious-row-optimisation/3426364#3426364 – Martin Smith Sep 07 '10 at 23:59
  • 1
    @Martin - Likewise - they're always different from the actual cost in terms of CPU / IO. It'd be nice to get a subset of the `STATISTICS IO` and `STATISTICS TIME` information shown instead here. – Will A Sep 08 '10 at 00:13
  • @stackoverflowuser I'd love to hear about performance results for the queries in my answer, if you feel like checking it out... – ErikE Sep 08 '10 at 01:42

3 Answers3

1

The second approach looks like a SQL Server 2000 approach before Row_Number() was introduced. This is the Greatest-n-per-group problem.

To evaluate them you should look at the execution plans and the I/O Stats by running SET STATISTICS IO ON

Of course for the specific example you have given the following would work equally well

 select TOP 1
        ID
        , Rev
        , Description 
        from Table_A
        Where ID = 1
ORDER BY Rev desc
Community
  • 1
  • 1
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
1

If you have a parent table with each ID listed only once, this can sometimes outperform other strategies including the row_number solution:

SELECT
   X.*
FROM
   ParentTable P
   CROSS APPLY (
      SELECT TOP 1 *
      FROM Table_A A
      WHERE P.ID = A.ID
      ORDER BY A.Rev DESC
   ) X

And the crazy, patented (just kidding), single scan magic query which can often outperform other methods as well:

SELECT
   ID,
   Rev = Convert(int, Substring(Packed, 1, 4)),
   Description = Convert(varchar(100), Substring(Packed, 5, 100))
FROM
   (
      SELECT
         ID,
         Packed = Max(Convert(binary(4), Rev) + Convert(varbinary(100), Description))
      FROM Table_A
      GROUP BY ID
   ) X

Note: this last method is not recommended, but it's fun to simulate the First/Last aggregates in MS Access.

ErikE
  • 48,881
  • 23
  • 151
  • 196
0

I would tend to favour the first approach - from a readability perspective, once you're comfortable with the ROW_NUMBER() OVER ... syntax then it's somewhat more readable. From a performance perspective, I would be surprised if there was much difference between the two - if there is then I would expect that the second would perform worse - but I stand to be corrected on that!

Will A
  • 24,780
  • 5
  • 50
  • 61