0

We have a table containing the current state and a history table tracking all changes to the state. Both are normal tables and records in the history table are inserted based on a trigger on the current table. Only the current table is modified.

The history table contains all data. So the latest record for specific id in the history table has the same information as the record in the current table.

We want to query the time when the value of the "state" column changed to the current state (if the current state is not null). The history table keeps records of changes not just to the "state" column, but also to other state columns.

Here are the tables:

Current state table:

id state other_state
1 green aaa
2 red bbb
3 null aaa

History table:

history_id id state other_state valid_from valid_to
8 1 green aaa 2023-06-18 9999-12-31
3 1 null aaa 2023-06-01 2023-06-18
7 2 red aaa 2023-06-16 9999-12-31
6 2 red bbb 2023-06-10 2023-06-16
2 2 null aaa 2023-06-01 2023-06-10
5 3 null aaa 2023-06-14 9999-12-31
4 3 blue aaa 2023-06-10 2023-06-14
1 3 null aaa 2023-06-01 2023-06-10

We want the output to look like this:

id state state_valid_since other_state
1 green 2023-06-18 aaa
2 red 2023-06-10 bbb
3 null null aaa

Since these history tables can be rather large, we are looking for an efficient way to query this. Our first attempts were with a calculating row number partitioned over id and state to find the changed values, but we never got to the correct result. We would appreciate any hints!

Here is a minimal example (but the query is not yet giving the expected results)

Declare @current_table  as table( 
id int, 
state varchar(10),
other_state  varchar(10))
INSERT INTO @current_table
VALUES 
(1, 'green' ,'aaa'),
(2, 'red','aaa'),
(3, null,'aaa')


Declare @history_table  as table( 
history_id int,
id int, 
state varchar(10),
other_state  varchar(10) ,
valid_from  date, 
valid_to  date)
INSERT INTO @history_table
VALUES 
(8, 1, 'green' ,'aaa', '2023-06-18'   , '9999-12-31'),
(3, 1, null,'aaa', '2023-06-01', '2023-06-18'),
(7, 2, 'red','aaa', '2023-06-16', '9999-12-31'),
(6, 2, 'red','bbb', '2023-06-10', '2023-06-16'),
(2, 2, null,'aaa', '2023-06-01', '2023-06-10'),
(5, 3, null,'aaa', '2023-06-14', '9999-12-31'),
(4, 3, 'blue','aaa', '2023-06-10', '2023-06-14'),
(1, 3, null,'aaa', '2023-06-01', '2023-06-10');

WITH changes 
     AS (SELECT Row_number() OVER ( PARTITION BY curr.id, curr.state ORDER BY curr.valid_from DESC) AS rn,
            curr.id, 
            curr.state, 
            curr.valid_from,
            curr.valid_to,
            prev.state      prev_state, 
            prev.valid_to   prev_valid_to 
          FROM 
                @history_table curr
                LEFT JOIN @history_table prev
                ON curr.id = prev.id AND curr.valid_from > prev.valid_from

) 

SELECT * from changes
where rn = 1
order by id, valid_from desc
aem
  • 444
  • 1
  • 4
  • 12

3 Answers3

1

It's not clear from your question if your history table is a normal table that you maintain or is a system-versioned temporal table - if not then perhaps it should be where you can use FOR SYSTEM_TIME to query it?

To get the desired result above you can simply use a correlated subquery:

select 
  id, state, (
    select top (1) valid_to
    from @history_table ht
    where ht.id = ct.id 
      and ct.state is not null
      and (ht.state != ct.state or ht.state is null)
    order by valid_to desc
  ),
  other_state
from @current_table ct;

For performance you'd want the history_table to have an index on id, valid_from.

If you did want to also get the values of other columns you would instead implement in an apply- although it's not really clear from your data or your description if the newest row in the history_table is the previous row of the current_table or not (it should be).

Edit

With some more info about your data I think a different approach is called for. I still think you should be implementing system-versioning here as having a history table that also contains the current row makes things harder.

Another approach would be to use the earliest valid_from date from the most recent block of matching states. A view on the history table can provide a grouped sequence, then your query can make use of the view instead:

create view history_groups as
with d as (
  select *, case when Lag(state) over(partition by id order by valid_from desc) = state then 0 else 1 end diff
  from dbo.history_table
)
select *, Sum(diff) over(partition by id order by valid_from desc) groupNo
from d;

select 
  id, state, (
    select top (1) valid_from
    from history_groups h
    where h.id = ct.id 
      and ct.state is not null
      and h.groupNo = 1
    order by h.valid_from
  ),
  other_state
from current_table ct;
Stu
  • 30,392
  • 6
  • 14
  • 33
  • Thanks for your answer. Both are normal tables, where the history table records are inserted based on a trigger on the current table. The history table always contains all entries, so the latest entry for a id in the history table is the same as in the current table. I will add this information to the question. – aem Jun 18 '23 at 08:20
  • So does the above query not answer your question? – Stu Jun 18 '23 at 08:38
  • Actually. i don't think it does. The history table could contain many rows where other_state changed but "state" did not. But we would like to get the time when state changed to the current value. – aem Jun 18 '23 at 08:42
  • 1
    https://dbfiddle.uk/_5ePhj44 – MatBailie Jun 18 '23 at 08:45
  • 1
    Ah thank you looks like I misunderstood the data that's indeed what it should be. I have amended the above, although happy to remove if you wanted to post your own answer. – Stu Jun 18 '23 at 09:00
  • Your answer is great and simple and i will accept it as answer. I found one case which is not covered by this query (and was not a part of question). If only one row exists in the history table directly after an insert (id 4 in fiddle), this query cannot be applied. I tried to adapt it but failed to come up with an adaptation of your solution. https://dbfiddle.uk/Gv3tSCRr – aem Jun 18 '23 at 15:16
  • @aem perhaps check out this addition to the [fiddle](https://dbfiddle.uk/rC704e0D) and my comments above - does this address your Id#4 issue or give you enough to get there? – Stu Jun 18 '23 at 16:35
  • No need for both the `top (1)` and `sum() over () = 1`. https://dbfiddle.uk/yexVQidS – MatBailie Jun 18 '23 at 21:45
  • Hi, this approach makes it also very easy to accomodate the case where a row is deleted at some point of time and then a row is added with the same data some times later https://dbfiddle.uk/yexVQidS – aem Jun 19 '23 at 10:30
  • Hi, @stu we did not go with system versioned tables since we had some columns which have frequent changes and we wanted to exclude them from versioning, which was simple with triggers. But i think we could move these columns to another table and then enable system versioning. Actually i always thought the system versioned history tables also contain the current state. You are right. I will try to switch to the standard scheme. Thanks! – aem Jun 19 '23 at 10:37
  • Ah sorry. Added the wrong fiddle for the case where an entry was deleted at some point and added later in time: https://dbfiddle.uk/sFSQbbN4 – aem Jun 19 '23 at 11:49
0

data

create table current_table ( 
id int, 
state varchar(10),
other_state  varchar(10))
INSERT INTO current_table
(id,state,other_state)
VALUES 
(1, 'green' ,'aaa'),
(2, 'red','aaa'),
(3, null,'aaa')


create table history_table ( 
history_id int,
id int, 
state varchar(10),
other_state  varchar(10) ,
valid_from  date, 
valid_to  date)
INSERT INTO history_table
(history_id,id,state,other_state,valid_to)
VALUES 
(8, 1, 'green' ,'aaa', '2023-06-18'   , '9999-12-31'),
(3, 1, null,'aaa', '2023-06-01', '2023-06-18'),
(7, 2, 'red','aaa', '2023-06-16', '9999-12-31'),
(6, 2, 'red','bbb', '2023-06-10', '2023-06-16'),
(2, 2, null,'aaa', '2023-06-01', '2023-06-10'),
(5, 3, null,'aaa', '2023-06-14', '9999-12-31'),
(4, 3, 'blue','aaa', '2023-06-10', '2023-06-14'),
(1, 3, null,'aaa', '2023-06-01', '2023-06-10');

you should use aggregate function max function inside window function Row_Number in CTE and to distinguish your values and use JOIN missing NULL values

with t as (
select  
c.id,
c.state, 
max(valid_from) valid_from ,
row_number() over (partition by c.id order by  max(valid_from) asc) idn,
h.other_state 
from current_table c
join history_table h
on c.id=h.id and 
(c.state=h.state or c.state is NULL and h.state is NULL )
group by c.id,c.state,h.other_state
)

select id,state,valid_from,other_state from  t
where idn=1
)

dbfiddle

RF1991
  • 2,037
  • 4
  • 8
  • 17
0

I adapted the solution of @stu and @MatBailie with a groupby approach:

WITH grps
AS (
    SELECT id
        ,STATE
        ,valid_from
        ,valid_to
        ,Row_number() OVER (
            PARTITION BY id ORDER BY valid_from DESC
            ) - Row_number() OVER (
            PARTITION BY STATE
            ,id ORDER BY valid_from DESC
            ) AS g
    FROM dbo.history_table
    )
SELECT id
    ,STATE
    ,(
        SELECT min(valid_from) AS valid_from
        FROM grps
        WHERE g = 0
            AND ct.id = id
            AND ct.STATE IS NOT NULL
        GROUP BY id
        ) AS modified
    ,other_state
FROM current_table ct

dbfiddle

This approach does not consider the case that an entry in the current table is inserted, deleted, and inserted again, so that the valid_to of one column is not the same value as the valid_from row of the next row in the history table.

However, on our table with 5k ids and history table with approx. 200k entries, this query is significantly faster.

aem
  • 444
  • 1
  • 4
  • 12