Efficiently query for the latest version of a record using SQL

Question

I need to query a table for the latest version of a record for all available dates (end of day time-series). The example below illustrates what I am trying to achieve.

My question is whether the table's design (primary key, etc.) and the LEFT OUTER JOIN query is accomplishing this goal in the most efficient manner.

CREATE TABLE [PriceHistory]
(
    [RowID] [int] IDENTITY(1,1) NOT NULL,
    [ItemIdentifier] [varchar](10) NOT NULL,
    [EffectiveDate] [date] NOT NULL,
    [Price] [decimal](12, 2) NOT NULL,

    CONSTRAINT [PK_PriceHistory] 
       PRIMARY KEY CLUSTERED ([ItemIdentifier] ASC, [RowID] DESC, [EffectiveDate] ASC)
)

INSERT INTO [PriceHistory] VALUES ('ABC','2016-03-15',5.50)
INSERT INTO [PriceHistory] VALUES ('ABC','2016-03-16',5.75)
INSERT INTO [PriceHistory] VALUES ('ABC','2016-03-16',6.25)
INSERT INTO [PriceHistory] VALUES ('ABC','2016-03-17',6.05)
INSERT INTO [PriceHistory] VALUES ('ABC','2016-03-18',6.85)
GO

SELECT 
    L.EffectiveDate, L.Price
FROM 
    [PriceHistory] L
LEFT OUTER JOIN 
    [PriceHistory] R ON L.ItemIdentifier = R.ItemIdentifier 
                     AND L.EffectiveDate = R.EffectiveDate 
                     AND L.RowID < R.RowID
WHERE 
    L.ItemIdentifier = 'ABC' and R.EffectiveDate is NULL
ORDER BY 
    L.EffectiveDate

Follow up: Table can contain thousands of ItemIdentifiers each with dacades worth of price data. Historical version of data needs to be preserved for audit reasons. Say I query the table and use the data in a report. I store @MRID = Max(RowID) at the time the report was generated. Now if the price for 'ABC' on '2016-03-16' is corrected at some later date, I can modify the query using @MRID and replicate the report that I ran earlier.

Thank you for posting ddl and sample data. It makes helping SO MUCH easier. I wish everyone would follow your lead like this!!! — Sean Lange, Mar 21 '16 at 21:21

score 2 · Answer 1 · answered Mar 21 '16 at 21:20

2

I assume you have more than 1 ItemIdentifier in your table. Your design is a bit problematic in that you are keeping versions of the data in your table. You can however do something like this quite easily to get the most recent one for each ItemIdentifier.

with sortedResults as
(
    select *
        , ROW_NUMBER() over(PARTITION by ItemIdentifier order by EffectiveDate desc) as RowNum
    from PriceHistory
)
select *
from sortedResults
where RowNum = 1

answered Mar 21 '16 at 21:20

Sean Lange

33,028
3
25
40

It would if they want only the value for a given ItemIdentifier. – Sean Lange Mar 21 '16 at 21:39
This query is only giving me the latest price of 'ABC'. I need the latest price for each date that exists in the table. The query that I provided produces the correct results: a time series with only one price per date. – c31983 Mar 21 '16 at 22:08
This does NOT limit the results to only show ABC. It will show you the latest price for each ItemIdentifier in the entire table. If you want only a single item then you should use one of TOP suggestions. – Sean Lange Mar 22 '16 at 13:31
Sorry, the point of my last comment was that it is only giving me the latest price - not a daily time series with the most recent price for each date. APH followed up with a modification of your answer that provides the desired result. – c31983 Mar 22 '16 at 14:06

score 2 · Accepted Answer · answered Mar 21 '16 at 22:49

2

A slightly modified version of @SeanLange's answer will give you the last row per date, instead of per product:

with sortedResults as
(
    select *
        , ROW_NUMBER() over(PARTITION by ItemIdentifier, EffectiveDate  
                            ORDER by ID desc) as RowNum
    from PriceHistory
)

select ItemIdentifier, EffectiveDate, Price
from sortedResults
where RowNum = 1
order by 2

answered Mar 21 '16 at 22:49

APH

4,109
1
25
36

1

This is apparently what the OP was looking for. The only suggestion is to not order by ordinal position. It is clearer and safer to order by column name. – Sean Lange Mar 22 '16 at 14:11
@SeanLange I agree. Using ORDER BY with an ordinal position, the code will not be readable enough, and moreover, if you change the order of columns in the SELECT, the query will return wrong results. – Alberto Solano Mar 22 '16 at 20:52

Bruce Dunwiddie · Answer 3 · 2016-03-22T14:56:20.420

1

Short answer, no.

You're hitting the same table twice, and possibly creating a looped table scan, depending on your existing indexes. In the best case, you're causing a looped index seek, and then throwing out most of the rows.

This would be the most efficient query for what you're asking.

SELECT
    L.EffectiveDate,
    L.Price
FROM
    (
        SELECT
            L.EffectiveDate,
            L.Price,
            ROW_NUMBER() OVER (
                PARTITION BY 
                    L.ItemIdentifier, 
                    L.EffectiveDate
                ORDER BY RowID DESC ) RowNum
        FROM [PriceHistory] L
        WHERE L.ItemIdentifier = 'ABC'
    ) L
WHERE
    L.RowNum = 1;

edited Mar 22 '16 at 14:56

answered Mar 21 '16 at 21:26

Bruce Dunwiddie

2,888
1
15
20

The first query you suggested only returns a single record. I need a time series that contains the most recent value for each date. The query that I included in my post produces the correct results. I am not familiar with the CROSS APPLY operator referenced in your second suggesting and am having trouble getting it to execute. Where is [Item] coming from? – c31983 Mar 21 '16 at 22:14
Ok, I fixed it, basically to match how the other answers are already structured. The problem comes if you now try to join this result set to another table or use in a larger query. That's where CROSS APPLY might come into play. I removed that suggestion since it does not sound like that's part of what you're looking for. – Bruce Dunwiddie Mar 22 '16 at 14:58

Efficiently query for the latest version of a record using SQL

3 Answers3