The problem I am trying to solve is that I need to know how many times EmployeeId + TenantId (1A) has shown up on consecutive runs starting at a given date (the current run). I really only care if its greater than 1.
My app runs on business days and inserts rows. It logs the current run date/time and it also adds the date and time from the last time it ran (retrieved from a different table obviously).
Table
CREATE TABLE EmployeeHistory
(
ID INTEGER PRIMARY KEY AUTOINCREMENT,
CurrentRun TEXT,
LastRun TEXT,
EmployeeId TEXT,
TenantId TEXT
);
Sample Data
DELETE FROM EmployeeHistory;
INSERT INTO EmployeeHistory ('CurrentRun', 'LastRun', 'EmployeeId', 'TenantId')
VALUES
-- August 19th run
( '2023-08-19 00:00:00.000000', '2023-08-18 00:00:00.000000', '1', 'A' ), -- Consecutive! (Employee exists in August 18th run)
( '2023-08-19 00:00:00.000000', '2023-08-18 00:00:00.000000', '3', 'A' ), -- Should not be included because was not in August 18th run
( '2023-08-19 00:00:00.000000', '2023-08-18 00:00:00.000000', '1', 'B' ), -- Should not be included because was not in August 18th run
( '2023-08-19 00:00:00.000000', '2023-08-18 00:00:00.000000', '2', 'B' ), -- Consecutive! (Employee exists in August 18th run)
( '2023-08-19 00:00:00.000000', '2023-08-18 00:00:00.000000', '4', 'A' ), -- Consecutive! (Employee exists in all runs)
( '2023-08-19 00:00:00.000000', '2023-08-18 00:00:00.000000', '2', 'A' ), -- Consecutive! (Employee exists in August 18th run)
-- August 18th run
( '2023-08-18 00:00:00.000000', '2023-08-17 00:00:00.000000', '1', 'A' ), -- Consecutive (Employee exists in August 19th run)!
( '2023-08-18 00:00:00.000000', '2023-08-17 00:00:00.000000', '2', 'A' ), -- Consecutive (Employee exists in August 19th run)!
( '2023-08-18 00:00:00.000000', '2023-08-17 00:00:00.000000', '2', 'B' ), -- Consecutive (Employee exists in August 19th run)!
( '2023-08-18 00:00:00.000000', '2023-08-17 00:00:00.000000', '4', 'A' ), -- Consecutive! (Employee exists in all runs)
-- August 17th run
( '2023-08-17 00:00:00.000000', '2023-08-17 00:00:00.000000', '3', 'A' ), -- Should not be included because was not in August 18th run
( '2023-08-17 00:00:00.000000', '2023-08-17 00:00:00.000000', '5', 'A' ), -- Should not be included because was not in August 18th run
( '2023-08-17 00:00:00.000000', '2023-08-17 00:00:00.000000', '6', 'A' ), -- Should not be included because was not in August 18th run
( '2023-08-18 00:00:00.000000', '2023-08-17 00:00:00.000000', '4', 'A' ); -- Consecutive! (Employee exists in all runs)
Current output :
ID CurrentRun LastRun EmployeeId TenantId
1 2023-08-19 00:00:00.000000 2023-08-18 00:00:00.000000 1 A
2 2023-08-19 00:00:00.000000 2023-08-18 00:00:00.000000 3 A
3 2023-08-19 00:00:00.000000 2023-08-18 00:00:00.000000 1 B
4 2023-08-19 00:00:00.000000 2023-08-18 00:00:00.000000 2 B
5 2023-08-19 00:00:00.000000 2023-08-18 00:00:00.000000 4 A
6 2023-08-19 00:00:00.000000 2023-08-18 00:00:00.000000 2 A
7 2023-08-18 00:00:00.000000 2023-08-17 00:00:00.000000 1 A
8 2023-08-18 00:00:00.000000 2023-08-17 00:00:00.000000 2 A
9 2023-08-18 00:00:00.000000 2023-08-17 00:00:00.000000 2 B
10 2023-08-18 00:00:00.000000 2023-08-17 00:00:00.000000 4 A
11 2023-08-17 00:00:00.000000 2023-08-16 00:00:00.000000 3 A
12 2023-08-17 00:00:00.000000 2023-08-16 00:00:00.000000 5 A
13 2023-08-17 00:00:00.000000 2023-08-16 00:00:00.000000 6 A
14 2023-08-18 00:00:00.000000 2023-08-17 00:00:00.000000 4 A
I have googled and read a bit about LEAD and LAG but I couldn't figure out how to get back the data I wanted.
One solution in my mind would be to add a column to track consecutive occurrences and simply increment the sum for a given employee based on the previous run. Is that a bad approach?