-1

The problem I am trying to solve is that I need to know how many times EmployeeId + TenantId (1A) has shown up on consecutive runs starting at a given date (the current run). I really only care if its greater than 1.

My app runs on business days and inserts rows. It logs the current run date/time and it also adds the date and time from the last time it ran (retrieved from a different table obviously).

Table

CREATE TABLE EmployeeHistory
(
   ID INTEGER PRIMARY KEY AUTOINCREMENT,
   CurrentRun TEXT,
   LastRun TEXT,
   EmployeeId TEXT,
   TenantId TEXT
);

Sample Data

DELETE FROM EmployeeHistory;

INSERT INTO EmployeeHistory ('CurrentRun', 'LastRun', 'EmployeeId', 'TenantId')

VALUES

-- August 19th run
( '2023-08-19 00:00:00.000000', '2023-08-18 00:00:00.000000', '1', 'A' ), -- Consecutive! (Employee exists in August 18th run)
( '2023-08-19 00:00:00.000000', '2023-08-18 00:00:00.000000', '3', 'A' ), -- Should not be included because was not in August 18th run
( '2023-08-19 00:00:00.000000', '2023-08-18 00:00:00.000000', '1', 'B' ), -- Should not be included because was not in August 18th run
( '2023-08-19 00:00:00.000000', '2023-08-18 00:00:00.000000', '2', 'B' ), -- Consecutive! (Employee exists in August 18th run)
( '2023-08-19 00:00:00.000000', '2023-08-18 00:00:00.000000', '4', 'A' ), -- Consecutive! (Employee exists in all runs)
( '2023-08-19 00:00:00.000000', '2023-08-18 00:00:00.000000', '2', 'A' ), -- Consecutive! (Employee exists in August 18th run)

-- August 18th run
( '2023-08-18 00:00:00.000000', '2023-08-17 00:00:00.000000', '1', 'A' ), -- Consecutive (Employee exists in August 19th run)!
( '2023-08-18 00:00:00.000000', '2023-08-17 00:00:00.000000', '2', 'A' ), -- Consecutive (Employee exists in August 19th run)!
( '2023-08-18 00:00:00.000000', '2023-08-17 00:00:00.000000', '2', 'B' ), -- Consecutive (Employee exists in August 19th run)!
( '2023-08-18 00:00:00.000000', '2023-08-17 00:00:00.000000', '4', 'A' ), -- Consecutive! (Employee exists in all runs)

-- August 17th run
( '2023-08-17 00:00:00.000000', '2023-08-17 00:00:00.000000', '3', 'A' ), -- Should not be included because was not in August 18th run
( '2023-08-17 00:00:00.000000', '2023-08-17 00:00:00.000000', '5', 'A' ), -- Should not be included because was not in August 18th run
( '2023-08-17 00:00:00.000000', '2023-08-17 00:00:00.000000', '6', 'A' ), -- Should not be included because was not in August 18th run
( '2023-08-18 00:00:00.000000', '2023-08-17 00:00:00.000000', '4', 'A' ); -- Consecutive! (Employee exists in all runs)

Current output :

ID  CurrentRun                  LastRun                     EmployeeId  TenantId
1   2023-08-19 00:00:00.000000  2023-08-18 00:00:00.000000  1           A
2   2023-08-19 00:00:00.000000  2023-08-18 00:00:00.000000  3           A
3   2023-08-19 00:00:00.000000  2023-08-18 00:00:00.000000  1           B
4   2023-08-19 00:00:00.000000  2023-08-18 00:00:00.000000  2           B
5   2023-08-19 00:00:00.000000  2023-08-18 00:00:00.000000  4           A
6   2023-08-19 00:00:00.000000  2023-08-18 00:00:00.000000  2           A
7   2023-08-18 00:00:00.000000  2023-08-17 00:00:00.000000  1           A
8   2023-08-18 00:00:00.000000  2023-08-17 00:00:00.000000  2           A
9   2023-08-18 00:00:00.000000  2023-08-17 00:00:00.000000  2           B
10  2023-08-18 00:00:00.000000  2023-08-17 00:00:00.000000  4           A
11  2023-08-17 00:00:00.000000  2023-08-16 00:00:00.000000  3           A
12  2023-08-17 00:00:00.000000  2023-08-16 00:00:00.000000  5           A
13  2023-08-17 00:00:00.000000  2023-08-16 00:00:00.000000  6           A
14  2023-08-18 00:00:00.000000  2023-08-17 00:00:00.000000  4           A

I have googled and read a bit about LEAD and LAG but I couldn't figure out how to get back the data I wanted.

One solution in my mind would be to add a column to track consecutive occurrences and simply increment the sum for a given employee based on the previous run. Is that a bad approach?

Victorio Berra
  • 2,760
  • 2
  • 28
  • 53

2 Answers2

0

You could simply use the EXISTS clause; this would give you only those instances where the same EmployeeId+TenantId occured under CurrentRun +/- 1day:

select *
from EmployeeHistory A
where exists (
           select 1
           from EmployeeHistory B
           where A.EmployeeId=B.EmployeeId
             and A.TenantId=B.TenantId
             and (date(A.CurrentRun) = date(B.CurrentRun, '+1 day')
                  or 
                  date(A.CurrentRun) = date(B.CurrentRun, '-1 day'))
             )

Why do you keep everything in 'TEXT' columns? you might need to adjust for data-type conversions.

Or... if you want to keep all rows in the output, but simply mark the consecutive ones:

select A.*
  ,case when 
            date(CurrentRun, '-1 day')=date(LAG(CurrentRun) over (partition by EmployeeId, TenantId order by CurrentRun)) 
         or date(CurrentRun, '+1 day')=date(LEAD(CurrentRun) over (partition by EmployeeId, TenantId order by CurrentRun)) 
     then 1 
     else 0 
   end as Consecutive
from EmployeeHistory A
tinazmu
  • 3,880
  • 2
  • 7
  • 20
  • My real EmployeeIds are not numbers so that one has to be TEST, ass for the dates with SQLIte I read it does not have a datetime column, has that changed? – Victorio Berra Aug 21 '23 at 14:23
  • Also does this assume the data is already ordered by the dates correctly like in my sample? Should I add some ordering to be safe? – Victorio Berra Aug 21 '23 at 14:32
  • İf you want the output to be ordered in a particular way, yes, you should add `order by ...`. As for datetime datatype: apologies, I am not familiar with sqlite, you are probably right. – tinazmu Aug 21 '23 at 21:30
0
CREATE TABLE EmployeeHistory
(
   ID INTEGER PRIMARY KEY AUTOINCREMENT,
   CurrentRun TEXT,
   LastRun TEXT,
   EmployeeId TEXT,
   TenantId TEXT
);

✓``` INSERT INTO EmployeeHistory ('CurrentRun', 'LastRun', 'EmployeeId', 'TenantId') VALUES ('2023-08-19 00:00:00.000000', '2023-08-18 00:00:00.000000', '1', 'A'), ('2023-08-19 00:00:00.000000', '2023-08-18 00:00:00.000000', '3', 'A'), ('2023-08-19 00:00:00.000000', '2023-08-18 00:00:00.000000', '1', 'B'), ('2023-08-19 00:00:00.000000', '2023-08-18 00:00:00.000000', '2', 'B'), ('2023-08-19 00:00:00.000000', '2023-08-18 00:00:00.000000', '4', 'A'), ('2023-08-19 00:00:00.000000', '2023-08-18 00:00:00.000000', '2', 'A'), ('2023-08-18 00:00:00.000000', '2023-08-17 00:00:00.000000', '1', 'A'), ('2023-08-18 00:00:00.000000', '2023-08-17 00:00:00.000000', '2', 'A'), ('2023-08-18 00:00:00.000000', '2023-08-17 00:00:00.000000', '2', 'B'), ('2023-08-18 00:00:00.000000', '2023-08-17 00:00:00.000000', '4', 'A'), ('2023-08-17 00:00:00.000000', '2023-08-16 00:00:00.000000', '3', 'A'), ('2023-08-17 00:00:00.000000', '2023-08-16 00:00:00.000000', '5', 'A'), ('2023-08-17 00:00:00.000000', '2023-08-16 00:00:00.000000', '6', 'A'), ('2023-08-18 00:00:00.000000', '2023-08-17 00:00:00.000000', '4', 'A');

✓``` 
SELECT a.*
FROM EmployeeHistory a
JOIN EmployeeHistory b
  ON a.EmployeeId = b.EmployeeId
  AND a.TenantId = b.TenantId
  AND a.LastRun = b.CurrentRun
WHERE a.CurrentRun >= '2023-08-17 00:00:00.000000' -- Starting date for the runs
ORDER BY a.CurrentRun DESC;

ID CurrentRun LastRun EmployeeId TenantId
1 2023-08-19 00:00:00.000000 2023-08-18 00:00:00.000000 1 A
4 2023-08-19 00:00:00.000000 2023-08-18 00:00:00.000000 2 B
5 2023-08-19 00:00:00.000000 2023-08-18 00:00:00.000000 4 A
5 2023-08-19 00:00:00.000000 2023-08-18 00:00:00.000000 4 A
6 2023-08-19 00:00:00.000000 2023-08-18 00:00:00.000000 2 A
WITH RankedRuns AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY EmployeeId, TenantId ORDER BY CurrentRun DESC) AS rn
  FROM EmployeeHistory
)
SELECT a.ID, a.CurrentRun, a.LastRun, a.EmployeeId, a.TenantId
FROM RankedRuns a
JOIN RankedRuns b
  ON a.EmployeeId = b.EmployeeId
  AND a.TenantId = b.TenantId
  AND a.LastRun = b.CurrentRun


| ID | CurrentRun | LastRun | EmployeeId | TenantId | | :---|:-----------|:--------|:-----------|:---------| | 1 | 2023-08-19 00:00:00.000000 | 2023-08-18 00:00:00.000000 | 1 | A | | 6 | 2023-08-19 00:00:00.000000 | 2023-08-18 00:00:00.000000 | 2 | A | | 4 | 2023-08-19 00:00:00.000000 | 2023-08-18 00:00:00.000000 | 2 | B | | 5 | 2023-08-19 00:00:00.000000 | 2023-08-18 00:00:00.000000 | 4 | A | | 5 | 2023-08-19 00:00:00.000000 | 2023-08-18 00:00:00.000000 | 4 | A | fiddle

Amira Bedhiafi
  • 8,088
  • 6
  • 24
  • 60