1

What is the fastest way to query for "latest version" on a large SQL table using an update timestamp in SQL Server?

I'm currently using the inner join approach, on very large SQL Server weather forecast table by Date, City, Hour, Temperature, UpdateTimestamp. To get the latest temperature forecast, I created a view using inner join on Date, City, and Hour + max(UpdateTimestamp), such as in this other posting.

However as the dataset is growing on the original table, the view query is getting slower and slower over time.

Wonder if others have encountered similar situation, and what's the best way to speed up this query (one alternative solution I'm considering is having a stored procedure run each day creating a separate table of the "latest version" only, which then will be very quick to access).

EDIT 4/4 - I've found the best solution so far (thanks Vikram) was to add a clustered index to my table on 3 fields "TSUnix", "CityId", "DTUnix", which sped up performance by ~4x (from 25 seconds to 4 seconds)

Also I've tried to use row_number solution (below query sample) , although appears bit slower than the "inner join" approach. Both queries + index creation are below :

Index Creation:

USE [<My DB>]
GO
CREATE NONCLUSTERED INDEX [index_WeatherForecastData]
ON [dbo].[<WeatherForecastData>] ([TSUnix], [CityId], [DTUnix])
INCLUDE ([Temperature], [TemperatureMin], [TemperatureMax], [Humidity], [WindSpeed], [Rain], [Snow])
GO

Query:

-- Inner Join Version

SELECT W.TSUnix, W.CityId, W.DTUnix, W.Temperature, W.*

FROM WeatherForecastData W

INNER JOIN (
    SELECT max(TSUnix) Latest, CityId, DTUnix 
    FROM WeatherForecastData 
    GROUP BY CityId, DTUnix
    ) L
    ON L.Latest = W.TSUnix
    AND L.CityID = W.CityID
    AND L.DTUnix = W.DTUnix

-- Row Number Version

SELECT W.TSUnix, W.CityId, W.DTUnix, W.Temperature, W.*

FROM 
    (
    select 
        *, ROW_NUMBER() over (partition by DTUnix, CityId order by TSUnix desc) as RowNumber
    from WeatherForecastData
    ) W

WHERE
    W.RowNumber = 1

Thanks!

George
  • 11
  • 2
  • 2
    Can you also desc your table structure here ? – Frank AK Mar 30 '18 at 01:38
  • Also read about SQL Indexes and performance improvement here https://stackoverflow.com/questions/5542895/sql-indexes-and-performance-improvement and https://www.red-gate.com/simple-talk/sql/learn-sql-server/using-covering-indexes-to-improve-query-performance/ this is what you want to improve the performance. – Vikram Palakurthi Mar 30 '18 at 01:41
  • While indexes will help your cause, you might want to take a look at sharding for a longer run. – danish Apr 04 '18 at 04:43

1 Answers1

0

Use ROW_NUMBER with an index as shown below.

The specific index that will make this fast is an index that has Date, City, Hour and UpdateTimestamp descending. This requires a single pass over the table rather than multiple passes an INNER JOIN would likely require.

Working code: http://sqlfiddle.com/#!18/8c0b4/1

SELECT Date, City, Hour, Temperature 
FROM
    (SELECT 
         Date, City, Hour, Temperature,
         ROW_NUMBER() OVER(PARTITION BY Date, City, Hour
                           ORDER BY UpdateTimestamp DESC) AS RowNumber
     FROM
         Test) AS t  
WHERE 
    t.RowNumber = 1
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
RnP
  • 390
  • 1
  • 8
  • Thanks for this suggestion - although when using this approach compared to inner join, the search came out to be bit slower (16 seconds vs 30 seconds). Also afterwards I created an index on the table, and it's still bit slower (5 seconds vs 25 seconds). Perhaps I might have done something wrong , I'll update my query into the post. – George Apr 04 '18 at 04:30
  • Can you share the index you created? And a picture of the query plan? – RnP Apr 04 '18 at 11:07
  • I'm not sure how to share the query plan , but I'll add the index query up in the comments. Thanks – George Apr 06 '18 at 01:18