What is the fastest way to query for "latest version" on a large SQL table using an update timestamp in SQL Server?
I'm currently using the inner join
approach, on very large SQL Server weather forecast table by Date, City, Hour, Temperature, UpdateTimestamp. To get the latest temperature forecast, I created a view using inner join
on Date, City, and Hour + max(UpdateTimestamp), such as in this other posting.
However as the dataset is growing on the original table, the view query is getting slower and slower over time.
Wonder if others have encountered similar situation, and what's the best way to speed up this query (one alternative solution I'm considering is having a stored procedure run each day creating a separate table of the "latest version" only, which then will be very quick to access).
EDIT 4/4 - I've found the best solution so far (thanks Vikram) was to add a clustered index to my table on 3 fields "TSUnix", "CityId", "DTUnix", which sped up performance by ~4x (from 25 seconds to 4 seconds)
Also I've tried to use row_number solution (below query sample) , although appears bit slower than the "inner join" approach. Both queries + index creation are below :
Index Creation:
USE [<My DB>]
GO
CREATE NONCLUSTERED INDEX [index_WeatherForecastData]
ON [dbo].[<WeatherForecastData>] ([TSUnix], [CityId], [DTUnix])
INCLUDE ([Temperature], [TemperatureMin], [TemperatureMax], [Humidity], [WindSpeed], [Rain], [Snow])
GO
Query:
-- Inner Join Version
SELECT W.TSUnix, W.CityId, W.DTUnix, W.Temperature, W.*
FROM WeatherForecastData W
INNER JOIN (
SELECT max(TSUnix) Latest, CityId, DTUnix
FROM WeatherForecastData
GROUP BY CityId, DTUnix
) L
ON L.Latest = W.TSUnix
AND L.CityID = W.CityID
AND L.DTUnix = W.DTUnix
-- Row Number Version
SELECT W.TSUnix, W.CityId, W.DTUnix, W.Temperature, W.*
FROM
(
select
*, ROW_NUMBER() over (partition by DTUnix, CityId order by TSUnix desc) as RowNumber
from WeatherForecastData
) W
WHERE
W.RowNumber = 1
Thanks!