1

I am relatively new to SQL and database structures and have a question about what the best way is to go about this.

I have historic change data of objects that will occasionally need to be transformed to snapshots of specific dates.

Example of the data:

ID        Value        UpdateDate

1         4            2017-01-01
2         4            2017-01-03
3         4            2017-01-03
1         7            2017-01-04
2         5            2017-01-08
3         5            2017-01-10
2         8            2017-01-11

I want to be able to create snapshots of a complete overview of all objects at any given date. For example:

   Current Date: 2017-01-01             |      Current Date: 2017-01-04
ID        Value        LastUpdateDate   |   ID        Value        LastUpdateDate            
                                        |
1         4            2017-01-01       |   1         7            2017-01-04
                                        |   2         4            2017-01-03
                                        |   3         4            2017-01-03
__________________________________________________________________________________
                                        |
   Current Date: 2017-01-08             |      Current Date: 2017-01-12
ID        Value        LastUpdateDate   |   ID        Value        LastUpdateDate            
                                        |
1         7            2017-01-04       |   1         7            2017-01-04
2         5            2017-01-08       |   1         8            2017-01-10
3         4            2017-01-03       |   1         5            2017-01-11

I created the following SQL query to create these tables. But I was wondering if this is the most 'efficient' way to do this? My tables in reality are quite large, sometimes 200K-300K unique objects, per object type, with several thousand updates each day (not per object, but just several thousand rows are added to the history table each day, so it's growing in size quite quickly).

The SQL query:

SELECT * INTO @CurrentOverviewTableName 
FROM @HistoryTableName 
INNER JOIN (
    SELECT ID AS ID_T, MAX(LastUpdateDate) AS LastUpdateDate 
    FROM @HistoryTableName
    WHERE LastUpdateDate  <= @OverviewDate
    GROUP BY ID
) ts 
ON S@HistoryTableName.ID = ts.ID_T AND @HistoryTableName.LastUpdateDate = ts.LastUpdateDate_T;

After which I get this for example:

ID        Value        LastUpdateDate        ID_T        LastUpdateDate_T

1         4            2017-01-01            1           2017-01-01
2         4            2017-01-03            2           2017-01-03
3         4            2017-01-03            3           2017-01-03

I drop the last two columns, as they're duplicates, to get what I need. Again, this works, but I'm wondering if it is the best way to do it?

I'm working with an Azure SQL DB.

Mark
  • 75
  • 1
  • 7

2 Answers2

0

Your method is fine. I would use window functions:

SELECT *
INTO @CurrentOverviewTableName 
FROM (SELECT ht.*,
             MAX(LastUpdateDate) OVER (PARTITION BY id) AS max_LastUpdateDate
      FROM @HistoryTableName ht
     ) ht
WHERE max_LastUpdateDate = LastUpdateDate;

If you don't want to delete the column, list out the columns you want, instead of using SELECT *. If you don't want to do that either, move the condition to the WHERE clause:

SELECT ht.*
INTO @CurrentOverviewTableName 
FROM @HistoryTableName ht
WHERE ht.LastUpdateDate = (SELECT MAX(h2.LastUpdateDate)
                           FROM @CurrentOverviewTableName ht2
                           WHERE ht2.id = ht.id
                          );
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

Assuming they don't go back in Value you could use the code below. If you want the option of them regressing in Value, then you need an identity column, preferably as the primary key, for the table and you would take the max identity instead. The subquery would change slightly. A primary key would also make the subquery back easier, using just the primary key instead of the two values, ID and value. See below.

--change the date for a different time period
DECLARE @updateDate date = '2017-01-12'
DECLARE @temp TABLE (ID int, Value int, updateDate date)

INSERT INTO @temp VALUES 
 (1, 4, '2017-01-01')
,(2, 4, '2017-01-03')
,(3, 4, '2017-01-03')
,(1, 7, '2017-01-04')
,(2, 5, '2017-01-08')
,(3, 5, '2017-01-10')
,(2, 8, '2017-01-11')

SELECT dT.*
      ,(SELECT T2.updateDate 
        FROM @temp T2 
        WHERE T2.ID = dT.Id AND T2.Value = dT.MaxValue) [LastUpdateDate] 
  FROM (
        SELECT ID, MAX(Value) MaxValue 
          FROM @temp T
         WHERE T.updateDate <= @updateDate
         GROUP By ID
       ) AS dT
Zorkolot
  • 1,899
  • 1
  • 11
  • 8