-1

I have asked 'Gaps and Islands' questions in the past, but this one is significantly different. I have an interesting question in Microsoft SQL Server 2016 database, T-SQL language. (Refer to image files and T-SQL scripts with temp tables)

I have a table called #EmployeeManagersSource (T-SQL scripts added, see image file at the bottom).

CREATE TABLE #EmployeeManagersSource

(

EmployeeName varchar(50),
EmployeeId int,
ManagerId int,
ManagerName varchar(50),
StartDate date,
EndDate date

);



INSERT INTO #EmployeeManagersSource

VALUES

('Andrew',  2367,   311,    'James',   '1/1/2017',  '1/31/2021'),
('Andrew',  2367,   411,    'Alex',    '2/1/2021',  '7/3/2021'),
('Andrew',  2367,   555,    'Sam',     '7/4/2021',  '2/27/2022'),
('Andrew',  2367,   444,    'Martin',  '7/5/2021',  '2/27/2022'),
('Andrew',  2367,   677,    'Frank',   '2/28/2022', '3/5/2022'),
('Andrew',  2367,   977,    'Whitney', '2/28/2022', '3/5/2022'),
('Andrew',  2367,   845,    'Joe',     '3/6/2022',  '3/15/2022'),
('Andrew',  2367,   652,    'Don',     '3/9/2022',  '3/12/2022'),
('Andrew',  2367,   559,    'Dan',     '3/16/2022', '3/19/2022'),
('Andrew',  2367,   439,    'Autumn',  '3/20/2022', '3/24/2022'),
('Andrew',  2367,   567,    'Melissa', '3/20/2022', '3/26/2022'),
('Andrew',  2367,   233,    'Ben',     '3/27/2022', '3/30/2022'),
('Andrew',  2367,   399,    'Lisa',    '3/31/2022', '4/8/2022'),
('Andrew',  2367,   555,    'Sam',     '4/4/2022',  '4/13/2022'),
('Andrew',  2367,   677,    'Frank',   '4/14/2022', '4/14/2022'),
('Andrew',  2367,   311,    'James',   '4/15/2022', '4/30/2022'),
('Andrew',  2367,   439,    'Autumn',  '4/19/2022', '4/26/2022'),
('Andrew',  2367,   399,    'Lisa',    '4/24/2022', '5/3/2022')

It has a list of employees with their managers. An employee is uniquely identified by the EmployeeId column, and likewise, a manager is uniquely identified by the ManagerId column.

I have a sample of this table, using just one employee with the name Andrew, and his unique identification (i.e. EmployeeId column value ) is 2367. (The #EmployeeManagersSource table has many employees in the real world.)

Andrew has many managers over the years. Andrew works in multiple departments of the company at the same time, so he may report to multiple managers at the same time. If at a point of time, he works only in one department, he will obviously have one manager.

Run the following query in T-SQL to get an idea:

SELECT *
FROM #EmployeeManagersSource

(Ignore the fact that some dates are in the future, the data is fictitious).

I need to develop a dataset as below (T-SQL scripts added, see image file at the bottom):

CREATE TABLE #EmployeeManagersDesiredOutput

(

EmployeeName varchar(50),
EmployeeId int,
ManagerId int,
ManagerName varchar(50),
StartDate date,
EndDate date

);


INSERT INTO #EmployeeManagersDesiredOutput

VALUES

('Andrew',  2367,   311,    'James',   '1/1/2017',  '1/31/2021'),
('Andrew',  2367,   411,    'Alex',    '2/1/2021',  '7/3/2021'),
('Andrew',  2367,   555,    'Sam',     '7/4/2021',  '7/4/2021'),
('Andrew',  2367,   555,    'Sam',     '7/5/2021',  '2/27/2022'),
('Andrew',  2367,   444,    'Martin',  '7/5/2021',  '2/27/2022'),
('Andrew',  2367,   677,    'Frank',   '2/28/2022', '3/5/2022'),
('Andrew',  2367,   977,    'Whitney', '2/28/2022', '3/5/2022'),
('Andrew',  2367,   845,    'Joe',     '3/6/2022',  '3/8/2022'),
('Andrew',  2367,   845,    'Joe',     '3/9/2022',  '3/12/2022'),
('Andrew',  2367,   652,    'Don',     '3/9/2022',  '3/12/2022'),
('Andrew',  2367,   845,    'Joe',     '3/13/2022', '3/15/2022'),
('Andrew',  2367,   559,    'Dan',     '3/16/2022', '3/19/2022'),
('Andrew',  2367,   439,    'Autumn',  '3/20/2022', '3/24/2022'),
('Andrew',  2367,   567,    'Melissa', '3/20/2022', '3/24/2022'),
('Andrew',  2367,   567,    'Melissa', '3/25/2022', '3/26/2022'),
('Andrew',  2367,   233,    'Ben',     '3/27/2022', '3/30/2022'),
('Andrew',  2367,   399,    'Lisa',    '3/31/2022', '4/3/2022'),
('Andrew',  2367,   399,    'Lisa',    '4/4/2022',  '4/8/2022'),
('Andrew',  2367,   555,    'Sam',     '4/4/2022',  '4/8/2022'),
('Andrew',  2367,   555,    'Sam',     '4/9/2022',  '4/13/2022'),
('Andrew',  2367,   677,    'Frank',   '4/14/2022', '4/14/2022'),
('Andrew',  2367,   311,    'James',   '4/15/2022', '4/18/2022'),
('Andrew',  2367,   311,    'James',   '4/19/2022', '4/23/2022'),
('Andrew',  2367,   439,    'Autumn',  '4/19/2022', '4/23/2022'),
('Andrew',  2367,   311,    'James',   '4/24/2022', '4/26/2022'),
('Andrew',  2367,   439,    'Autumn',  '4/24/2022', '4/26/2022'),
('Andrew',  2367,   399,    'Lisa',    '4/24/2022', '4/26/2022'),
('Andrew',  2367,   311,    'James',   '4/27/2022', '4/30/2022'),
('Andrew',  2367,   399,    'Lisa',    '4/27/2022', '4/30/2022'),
('Andrew',  2367,   399,    'Lisa',    '5/1/2022',  '5/3/2022')

SELECT *
FROM #EmployeeManagersDesiredOutput

The lowest level of granularity for time in this scenario is a calendar day.

If Andrew reports to multiple managers on a given calendar day, it must be shown in separate rows with the overlapping period and the range associated. Please run the following queries to get an idea.

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '7/5/2021'
AND EndDate <= '2/27/2022'

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '3/9/2022'
AND EndDate <= '3/12/2022'

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '3/20/2022'
AND EndDate <= '3/24/2022'

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '4/4/2022'
AND EndDate <= '4/8/2022'

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '4/19/2022'
AND EndDate <= '4/23/2022'

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '4/24/2022'
AND EndDate <= '4/26/2022'

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '4/27/2022'
AND EndDate <= '4/30/2022'

Other rows that do not belong to the overlap must be separated, by adding or subtracting a day.

For example:

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate = '7/4/2021'
AND EndDate = '7/4/2021'


SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '3/6/2022'
AND EndDate <= '3/8/2022'

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '3/13/2022'
AND EndDate <= '3/15/2022'

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '3/31/2022'
AND EndDate <= '4/3/2022'


SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '4/15/2022'
AND EndDate <= '4/18/2022'

SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '5/1/2022'
AND EndDate <= '5/3/2022'

I have the following criteria (for any given employee, such as Andrew) (refer to the #EmployeeManagersSource table):

  1. StartDate is the same for multiple managers (eg: Autumn and Melissa, Mar 20,2022), but EndDate varies
  2. StartDate varies for multiple managers, but EndDate is the same for those managers (eg: Sam and Martin, Feb 27, 2022).
  3. Partial overlap between StartDate and EndDate between multiple managers. (eg: Lisa (Mar 31, 2022 to Apr 8, 2022) and Sam (Apr 4, 2022 to Apr 13, 2022))
  4. StartDate and EndDate of a manager are entirely a subset of those of another manager. (eg: Joe (Mar 6, 2022 to Mar 15, 2022) and Don (Mar 9, 2022 to Mar 12, 2022))
  5. Sometimes even 3 managers handle at the same time (eg: James, Autumn, Lisa (Apr 24, 2022 to Apr 26, 2022))
  6. StartDate and EndDate of a manager exactly matches another manager (eg: Frank and Whitney, Feb 28, 2022 to Mar 5, 2022)
  7. Just one manager in a period (eg: Dan, Mar 16, 2022 to Mar 19, 2022)

In case (6) and (7), data can be moved from the #EmployeeManagersSource table to the #EmployeeManagersDesiredOutput without any change.

Any idea on how to convert #EmployeeManagersSource to #EmployeeManagersDesiredOutput ?

enter image description here

enter image description here

My approach:

Develop a temp table (#EmployeeManagersIntermediate) with exploded dates, using the #EmployeeManagersSource table as source:

;
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1)
    ,E02(N) AS (SELECT 1 FROM E00 a, E00 b)
    ,E04(N) AS (SELECT 1 FROM E02 a, E02 b)
    ,E08(N) AS (SELECT 1 FROM E04 a, E04 b)
    ,E16(N) AS (SELECT 1 FROM E08 a, E08 b)
    ,E32(N) AS (SELECT 1 FROM E16 a, E16 b)
    ,cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E32)
    ,DateRange AS
(
    SELECT ExplodedDate = DATEADD(DAY,N - 1,'1960-01-01')
    FROM cteTally
    WHERE N <= 365000
)
SELECT EmployeeName, EmployeeId, ManagerId, ManagerName, StartDate, EndDate, CONVERT(date,ExplodedDate) AS ExplodedDate
INTO #EmployeeManagersIntermediate
FROM #EmployeeManagersSource eh
JOIN DateRange d ON d.ExplodedDate >= eh.[StartDate]
 AND d.ExplodedDate <= eh.[EndDate];
 

 SELECT *
 FROM #EmployeeManagersIntermediate
 WHERE ManagerName = 'Lisa'

However, am not able to get the #EmployeeManagersDesiredOutput using #EmployeeManagersIntermediate, based on this 'Gaps and Islands' problem here.

I think I need a proper PARTITION BY clause. Can someone suggest a solution on how change #EmployeeManagersIntermediate to #EmployeeManagersDesiredOutput ?

user3812887
  • 439
  • 12
  • 33
  • @LukStorms any idea on this ? this is significantly different from the one you answered last time. – user3812887 Apr 02 '22 at 02:58
  • @Charlieface any help on this ? – user3812887 Apr 02 '22 at 02:59
  • Please don't ping specific people - [this isn't Gotham City](https://cdnmetv.metv.com/rt4pZ-1497540968-1589-blog-Header.jpg). – Aaron Bertrand Apr 02 '22 at 12:05
  • @Aaron Bertrand sorry for it; i will not ping specific people in the future; also, do you know the reason why 2 people have asked to close this ? I have been as clear and focused as possible. – user3812887 Apr 02 '22 at 14:54
  • Hello Friends, I have been clear and concise with my question. Not sure why 2 people have asked for closing this useful question. Please let me know if I need to modify something. – user3812887 Apr 02 '22 at 14:58
  • I have trimmed down and removed unwanted sentences from my post. Is it possible to undo the 2 votes that were meant to close this post ? I have been as clear as possible. – user3812887 Apr 02 '22 at 23:07
  • Although you find this problem fascinating and even more (absolutely) unique than other problems, you've failed to mention what you've tried. Somewhat like your prior questions that have sample data, desired results, DDL, scripts, but without a _specific problem_ that you encountered with _your code_ trying to solve the problem. Have you tried something like joining with a tally table to expand the data set into daily rows for each combination of employee and manager, then collapsing them down based on overlaps? It's brute force, but may provide some insight. – HABO Apr 03 '22 at 02:09
  • @HABO I tried a few techniques; i know how to expand dates between a minimum and maximum date, but am missing a crucial idea in this one. Is there anything you can suggest using a CTE or something. – user3812887 Apr 03 '22 at 03:09
  • @HABO I think i am getting an answer with a daily date row, but is there a short-cut method ? The daily date option is my last resort – user3812887 Apr 03 '22 at 04:25
  • @HABO I have made good progress here. Thanks for your suggestion to use exploded dates; am inching closer. Can you suggest a technique to change #EmployeeManagersIntermediate to #EmployeeManagersDesiredOutput ? – user3812887 Apr 03 '22 at 17:12
  • @Razvan Socol this code works perfect; i will get back in a day or two, and if this works fine with the full database, we can make this the answer; thanks to all of you for suggesting solutions, especially Razvan Socol; There is no need for me to develop an intermediate table called #EmployeeManagersIntermediate ! – user3812887 Apr 03 '22 at 19:43

1 Answers1

1

You can use the following query:

;WITH Dates AS (
    SELECT EmployeeId, EmployeeName, SomeDate,
        ROW_NUMBER() OVER (PARTITION BY x.EmployeeId ORDER BY x.SomeDate) AS RowNum
    FROM (
        SELECT EmployeeId, EmployeeName, StartDate AS SomeDate
        FROM #EmployeeManagersSource
        UNION
        SELECT EmployeeId, EmployeeName, DATEADD(DAY,1,EndDate)
        FROM #EmployeeManagersSource
    ) x
), Intervals AS (
    SELECT d1.EmployeeId, d1.EmployeeName, 
        d1.SomeDate AS StartDate, DATEADD(DAY,-1,d2.SomeDate) AS EndDate
    FROM Dates d1 
    INNER JOIN Dates d2 
    ON d2.EmployeeId = d1.EmployeeId AND d1.RowNum=d2.RowNum-1
)
SELECT i.EmployeeName, i.EmployeeId, s.ManagerId, s.ManagerName, i.StartDate, i.EndDate
FROM Intervals i
INNER JOIN #EmployeeManagersSource s 
ON s.EmployeeId = i.EmployeeId AND s.StartDate<=i.StartDate AND s.EndDate>=i.EndDate

The first CTE builds a list of dates when changes appear, the second CTE builds the intervals, the final query finds the mangers for each interval.

Razvan Socol
  • 5,426
  • 2
  • 20
  • 32
  • thanks very much, appreciate it; it works fine with this sample data; i will test this code with the full database; hopefully tomorrow or the day after i will mark your answer. Perfect thanks – user3812887 Apr 03 '22 at 19:41
  • 1
    If it works slowly on more data, you can create temporary tables for `Dates` and `Intervals` (instead of using CTE-s). Maybe add an index on `EmployeeID` in `#Intervals`. – Razvan Socol Apr 04 '22 at 04:53
  • thanks very much; it works in the full database; this is a great solution – user3812887 Apr 05 '22 at 01:56