I have asked 'Gaps and Islands' questions in the past, but this one is significantly different. I have an interesting question in Microsoft SQL Server 2016 database, T-SQL language. (Refer to image files and T-SQL scripts with temp tables)
I have a table called #EmployeeManagersSource (T-SQL scripts added, see image file at the bottom).
CREATE TABLE #EmployeeManagersSource
(
EmployeeName varchar(50),
EmployeeId int,
ManagerId int,
ManagerName varchar(50),
StartDate date,
EndDate date
);
INSERT INTO #EmployeeManagersSource
VALUES
('Andrew', 2367, 311, 'James', '1/1/2017', '1/31/2021'),
('Andrew', 2367, 411, 'Alex', '2/1/2021', '7/3/2021'),
('Andrew', 2367, 555, 'Sam', '7/4/2021', '2/27/2022'),
('Andrew', 2367, 444, 'Martin', '7/5/2021', '2/27/2022'),
('Andrew', 2367, 677, 'Frank', '2/28/2022', '3/5/2022'),
('Andrew', 2367, 977, 'Whitney', '2/28/2022', '3/5/2022'),
('Andrew', 2367, 845, 'Joe', '3/6/2022', '3/15/2022'),
('Andrew', 2367, 652, 'Don', '3/9/2022', '3/12/2022'),
('Andrew', 2367, 559, 'Dan', '3/16/2022', '3/19/2022'),
('Andrew', 2367, 439, 'Autumn', '3/20/2022', '3/24/2022'),
('Andrew', 2367, 567, 'Melissa', '3/20/2022', '3/26/2022'),
('Andrew', 2367, 233, 'Ben', '3/27/2022', '3/30/2022'),
('Andrew', 2367, 399, 'Lisa', '3/31/2022', '4/8/2022'),
('Andrew', 2367, 555, 'Sam', '4/4/2022', '4/13/2022'),
('Andrew', 2367, 677, 'Frank', '4/14/2022', '4/14/2022'),
('Andrew', 2367, 311, 'James', '4/15/2022', '4/30/2022'),
('Andrew', 2367, 439, 'Autumn', '4/19/2022', '4/26/2022'),
('Andrew', 2367, 399, 'Lisa', '4/24/2022', '5/3/2022')
It has a list of employees with their managers. An employee is uniquely identified by the EmployeeId column, and likewise, a manager is uniquely identified by the ManagerId column.
I have a sample of this table, using just one employee with the name Andrew, and his unique identification (i.e. EmployeeId column value ) is 2367. (The #EmployeeManagersSource table has many employees in the real world.)
Andrew has many managers over the years. Andrew works in multiple departments of the company at the same time, so he may report to multiple managers at the same time. If at a point of time, he works only in one department, he will obviously have one manager.
Run the following query in T-SQL to get an idea:
SELECT *
FROM #EmployeeManagersSource
(Ignore the fact that some dates are in the future, the data is fictitious).
I need to develop a dataset as below (T-SQL scripts added, see image file at the bottom):
CREATE TABLE #EmployeeManagersDesiredOutput
(
EmployeeName varchar(50),
EmployeeId int,
ManagerId int,
ManagerName varchar(50),
StartDate date,
EndDate date
);
INSERT INTO #EmployeeManagersDesiredOutput
VALUES
('Andrew', 2367, 311, 'James', '1/1/2017', '1/31/2021'),
('Andrew', 2367, 411, 'Alex', '2/1/2021', '7/3/2021'),
('Andrew', 2367, 555, 'Sam', '7/4/2021', '7/4/2021'),
('Andrew', 2367, 555, 'Sam', '7/5/2021', '2/27/2022'),
('Andrew', 2367, 444, 'Martin', '7/5/2021', '2/27/2022'),
('Andrew', 2367, 677, 'Frank', '2/28/2022', '3/5/2022'),
('Andrew', 2367, 977, 'Whitney', '2/28/2022', '3/5/2022'),
('Andrew', 2367, 845, 'Joe', '3/6/2022', '3/8/2022'),
('Andrew', 2367, 845, 'Joe', '3/9/2022', '3/12/2022'),
('Andrew', 2367, 652, 'Don', '3/9/2022', '3/12/2022'),
('Andrew', 2367, 845, 'Joe', '3/13/2022', '3/15/2022'),
('Andrew', 2367, 559, 'Dan', '3/16/2022', '3/19/2022'),
('Andrew', 2367, 439, 'Autumn', '3/20/2022', '3/24/2022'),
('Andrew', 2367, 567, 'Melissa', '3/20/2022', '3/24/2022'),
('Andrew', 2367, 567, 'Melissa', '3/25/2022', '3/26/2022'),
('Andrew', 2367, 233, 'Ben', '3/27/2022', '3/30/2022'),
('Andrew', 2367, 399, 'Lisa', '3/31/2022', '4/3/2022'),
('Andrew', 2367, 399, 'Lisa', '4/4/2022', '4/8/2022'),
('Andrew', 2367, 555, 'Sam', '4/4/2022', '4/8/2022'),
('Andrew', 2367, 555, 'Sam', '4/9/2022', '4/13/2022'),
('Andrew', 2367, 677, 'Frank', '4/14/2022', '4/14/2022'),
('Andrew', 2367, 311, 'James', '4/15/2022', '4/18/2022'),
('Andrew', 2367, 311, 'James', '4/19/2022', '4/23/2022'),
('Andrew', 2367, 439, 'Autumn', '4/19/2022', '4/23/2022'),
('Andrew', 2367, 311, 'James', '4/24/2022', '4/26/2022'),
('Andrew', 2367, 439, 'Autumn', '4/24/2022', '4/26/2022'),
('Andrew', 2367, 399, 'Lisa', '4/24/2022', '4/26/2022'),
('Andrew', 2367, 311, 'James', '4/27/2022', '4/30/2022'),
('Andrew', 2367, 399, 'Lisa', '4/27/2022', '4/30/2022'),
('Andrew', 2367, 399, 'Lisa', '5/1/2022', '5/3/2022')
SELECT *
FROM #EmployeeManagersDesiredOutput
The lowest level of granularity for time in this scenario is a calendar day.
If Andrew reports to multiple managers on a given calendar day, it must be shown in separate rows with the overlapping period and the range associated. Please run the following queries to get an idea.
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '7/5/2021'
AND EndDate <= '2/27/2022'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '3/9/2022'
AND EndDate <= '3/12/2022'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '3/20/2022'
AND EndDate <= '3/24/2022'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '4/4/2022'
AND EndDate <= '4/8/2022'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '4/19/2022'
AND EndDate <= '4/23/2022'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '4/24/2022'
AND EndDate <= '4/26/2022'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '4/27/2022'
AND EndDate <= '4/30/2022'
Other rows that do not belong to the overlap must be separated, by adding or subtracting a day.
For example:
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate = '7/4/2021'
AND EndDate = '7/4/2021'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '3/6/2022'
AND EndDate <= '3/8/2022'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '3/13/2022'
AND EndDate <= '3/15/2022'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '3/31/2022'
AND EndDate <= '4/3/2022'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '4/15/2022'
AND EndDate <= '4/18/2022'
SELECT *
FROM #EmployeeManagersDesiredOutput
WHERE StartDate >= '5/1/2022'
AND EndDate <= '5/3/2022'
I have the following criteria (for any given employee, such as Andrew) (refer to the #EmployeeManagersSource table):
- StartDate is the same for multiple managers (eg: Autumn and Melissa, Mar 20,2022), but EndDate varies
- StartDate varies for multiple managers, but EndDate is the same for those managers (eg: Sam and Martin, Feb 27, 2022).
- Partial overlap between StartDate and EndDate between multiple managers. (eg: Lisa (Mar 31, 2022 to Apr 8, 2022) and Sam (Apr 4, 2022 to Apr 13, 2022))
- StartDate and EndDate of a manager are entirely a subset of those of another manager. (eg: Joe (Mar 6, 2022 to Mar 15, 2022) and Don (Mar 9, 2022 to Mar 12, 2022))
- Sometimes even 3 managers handle at the same time (eg: James, Autumn, Lisa (Apr 24, 2022 to Apr 26, 2022))
- StartDate and EndDate of a manager exactly matches another manager (eg: Frank and Whitney, Feb 28, 2022 to Mar 5, 2022)
- Just one manager in a period (eg: Dan, Mar 16, 2022 to Mar 19, 2022)
In case (6) and (7), data can be moved from the #EmployeeManagersSource table to the #EmployeeManagersDesiredOutput without any change.
Any idea on how to convert #EmployeeManagersSource to #EmployeeManagersDesiredOutput ?
My approach:
Develop a temp table (#EmployeeManagersIntermediate) with exploded dates, using the #EmployeeManagersSource table as source:
;
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1)
,E02(N) AS (SELECT 1 FROM E00 a, E00 b)
,E04(N) AS (SELECT 1 FROM E02 a, E02 b)
,E08(N) AS (SELECT 1 FROM E04 a, E04 b)
,E16(N) AS (SELECT 1 FROM E08 a, E08 b)
,E32(N) AS (SELECT 1 FROM E16 a, E16 b)
,cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E32)
,DateRange AS
(
SELECT ExplodedDate = DATEADD(DAY,N - 1,'1960-01-01')
FROM cteTally
WHERE N <= 365000
)
SELECT EmployeeName, EmployeeId, ManagerId, ManagerName, StartDate, EndDate, CONVERT(date,ExplodedDate) AS ExplodedDate
INTO #EmployeeManagersIntermediate
FROM #EmployeeManagersSource eh
JOIN DateRange d ON d.ExplodedDate >= eh.[StartDate]
AND d.ExplodedDate <= eh.[EndDate];
SELECT *
FROM #EmployeeManagersIntermediate
WHERE ManagerName = 'Lisa'
However, am not able to get the #EmployeeManagersDesiredOutput using #EmployeeManagersIntermediate, based on this 'Gaps and Islands' problem here.
I think I need a proper PARTITION BY clause. Can someone suggest a solution on how change #EmployeeManagersIntermediate to #EmployeeManagersDesiredOutput ?