0

Let's say I have a table labeled employees and it looks like so

date employee id
4/1/2022 9
4/1/2022 8
3/1/2022 9
3/1/2022 8
3/1/2022 7
2/1/2022 9
2/1/2022 8
2/1/2022 7
1/1/2022 9
1/1/2022 8
1/1/2022 7
1/1/2022 6
1/1/2022 5

I'd like for my final output to look like below

date departures
4/1/2022 NULL
3/1/2022 1
2/1/2022 0
1/1/2022 2

Essentially, if the employee is not found on the next upcoming month, it's counting it as a departure for the current month. I already am calculating this with the following query:

WITH MONTHLY_COUNT(DTE, EMP_COUNT) AS(
SELECT
    [DATE],
    COUNT(1)    AS [COUNT]
FROM
    employees A
GROUP BY
    [DATE]),raw_data(EMPLOYEE, DATE) AS(
SELECT
    A.EMPLOYEE,
    [DATE]
FROM
    employees A),RANKING(DTE, EMP_COUNT, RANKING) AS(
SELECT
    A.DTE,
    A.EMP_COUNT,
    A.RANKING
FROM(

    SELECT
        TOP(6) ---pick the most recent x months
        A.DTE,
        A.EMP_COUNT,
        RANK() OVER(ORDER BY A.DTE DESC) AS [RANK]
    FROM 
        MONTHLY_COUNT AS A
)A(DTE, EMP_COUNT, RANKING))SELECT
    A.DTE,
    CASE
        WHEN A.DTE = B.DTE THEN B.SEPARATIONS
        WHEN A.DTE = C.DTE THEN C.SEPARATIONS
        WHEN A.DTE = D.DTE THEN D.SEPARATIONS
    END AS SEPARATIONS

FROM(
    SELECT
        TOP(6) --pick the most recent x months
        A.DTE
    FROM
        MONTHLY_COUNT A
    ORDER BY
        A.DTE DESC
)A

--COMPARE MONTH RANK 2 AND 1
LEFT JOIN(

    SELECT
        A.DTE,
        SUM(CASE WHEN B.EMPLOYEE IS NULL THEN 1 ELSE 0 END) AS SEPARATIONS
    FROM(

        SELECT *
        FROM
            RANKING A
        LEFT JOIN 
            raw_data B ON A.DTE = B.DATE 
        WHERE A.RANKING = '2'  ----this is what I want to automate

    )A

    LEFT JOIN(
        SELECT *
        FROM
            RANKING A
        LEFT JOIN 
            raw_data B ON A.DTE = B.DATE  
        WHERE A.RANKING = '1' ----this is what I want to automate
    )B ON A.Employee = B.Employee

    GROUP BY
        A.DTE

)B ON B.DTE = A.DTE

--COMPARE MONTH RANK 3 AND 2
LEFT JOIN(

    SELECT
        A.DTE,
        SUM(CASE WHEN B.EMPLOYEE IS NULL THEN 1 ELSE 0 END) AS SEPARATIONS
    FROM(

        SELECT *
        FROM
            RANKING A
        LEFT JOIN 
            raw_data B ON A.DTE = B.DATE 
        WHERE A.RANKING = '3'

    )A

    LEFT JOIN(
        SELECT *
        FROM
            RANKING A
        LEFT JOIN 
            raw_data B ON A.DTE = B.DATE
        WHERE A.RANKING = '2'
    )B ON A.Employee = B.Employee

    GROUP BY
        A.DTE

)C ON C.DTE = A.DTE
--COMPARE MONTH RANK 4 AND 3
LEFT JOIN(

    SELECT
        A.DTE,
        SUM(CASE WHEN B.EMPLOYEE IS NULL THEN 1 ELSE 0 END) AS SEPARATIONS
    FROM(

        SELECT *
        FROM
            RANKING A
        LEFT JOIN 
            raw_data B ON A.DTE = B.DATE 
        WHERE A.RANKING = '4'

    )A

    LEFT JOIN(
        SELECT *
        FROM
            RANKING A
        LEFT JOIN 
            raw_data B ON A.DTE = B.DATE 
        WHERE A.RANKING = '3'
    )B ON A.Employee = B.Employee

    GROUP BY
        A.DTE
)D ON D.DTE = A.DTE ORDER BY A.DTE DESC

The main problem is how can I automate this to where I'm not explicitly comparing months 2 and 1, and then comparing months 3 and 2, and then comparing months 4 and 3, and so on ?

  • Since SQL includes data definition, a [mcve] for an [SQL question](//meta.stackoverflow.com/q/333952/90527) should include [DDL](//en.wikipedia.org/wiki/Data_definition_language) statements for sample tables (rather than an ad hoc table specification) and [DML](//en.wikipedia.org/wiki/Data_manipulation_language) statements for sample data (rather than a dump or ad hoc format). Desired results don't need to be presented as sample code, as results are the output of code and not code themselves. – outis Oct 24 '22 at 21:09
  • 1
    … See the [help] for more on [how to ask good questions](/help/how-to-ask) and many other helpful topics. See also the [tips for asking a good Structured Query Language (SQL) question](//meta.stackoverflow.com/questions/271055). – outis Oct 24 '22 at 21:09
  • Aside: Why are you comparing `RANKING` to a string, e.g. `WHERE A.RANKING = '2'` rather than `WHERE A.RANKING = 2'? – HABO Oct 24 '22 at 21:34
  • Related: "[get summary of differences vs previous month](/q/63737197/90527)" – outis Oct 24 '22 at 22:53

2 Answers2

2

There is no need to really do this recursively - you can do it using sets of data. .

There are two approaches below - they check, for each employee and month, whether that employee has a later record. The first approach uses LEAD to get later records for an individual; the second approach explicitly checks the next month.

The two approaches will differ if a person 'comes back' several months after leaving - the first approach wouldn't treat it as a departure (as they will have a later record), while the second would (as they did not have a record for that specific month).

This db<>fiddle has the SQL below so you can examine it operating.

Initial table setup:

CREATE TABLE #EmpMonths (EmpDate date, EmployeeID int);
INSERT INTO #EmpMonths (EmpDate, EmployeeID) VALUES
('20220401',    9),
('20220401',    8),
('20220301',    9),
('20220301',    8),
('20220301',    7),
('20220201',    9),
('20220201',    8),
('20220201',    7),
('20220101',    9),
('20220101',    8),
('20220101',    7),
('20220101',    6),
('20220101',    5);

As there are no corresponding later records for the 'latest' record (1 April 2022), in this I will report only months prior to the most recent record For this I use MaxDate. Feel free to use any other approach.

DECLARE @MaxEmpDate date;
SET @MaxEmpDate = (SELECT MAX(EmpDate) FROM #EmpMonths);

Approach 1: Using LEAD

The following SQL gets a 2nd date associated with each employee/month record. This has the month they next appear in the dataset. (Note that I have included it here for your info only - it gets used in the answer in the next block of code).

SELECT      EmployeeID, EmpDate, LEAD(EmpDate, 1) OVER (PARTITION BY EmployeeID ORDER BY EmpDate) AS Later_Date
    FROM    #EmpMonths;

You can then use the above to get the departures by month - a NULL in the Later_Date column indicates they didn't have a later record - so you can count NULL records by month to get your answer.

WITH EmpRecords AS
        (SELECT     EmpDate, LEAD(EmpDate, 1) OVER (PARTITION BY EmployeeID ORDER BY EmpDate) AS Later_Date
            FROM    #EmpMonths
        )
    SELECT  EmpDate, COUNT(CASE WHEN Later_Date IS NULL THEN 1 ELSE NULL END) AS Num_Departures
    FROM    EmpRecords
    WHERE   EmpDate < @MaxEmpDate
    GROUP BY EmpDate;

Approach 2: Explicitly checking the next month

The alternative approach is to explicitly check the next month, to see if they have a record for that month. The following does this with a LEFT JOIN on the same table (joining on EmployeeID, and 1 month later).

SELECT      CurrentMonth.EmployeeID, CurrentMonth.EmpDate, NextMonth.EmpDate AS Later_Date
    FROM    #EmpMonths AS CurrentMonth
            LEFT OUTER JOIN #EmpMonths AS NextMonth ON CurrentMonth.EmployeeID = NextMonth.EmployeeID
                                                    AND DATEADD(month, 1, CurrentMonth.EmpDate) = NextMonth.EmpDate
    WHERE   CurrentMonth.EmpDate < @MaxEmpDate;

Similar to above, you can then count NULLs for each month.

WITH EmpRecords AS
        (SELECT     CurrentMonth.EmployeeID, CurrentMonth.EmpDate, NextMonth.EmpDate AS Later_Date
            FROM    #EmpMonths AS CurrentMonth
                    LEFT OUTER JOIN #EmpMonths AS NextMonth ON CurrentMonth.EmployeeID = NextMonth.EmployeeID
                                                            AND DATEADD(month, 1, CurrentMonth.EmpDate) = NextMonth.EmpDate
            WHERE   CurrentMonth.EmpDate < @MaxEmpDate
        )
    SELECT  EmpDate, COUNT(CASE WHEN Later_Date IS NULL THEN 1 ELSE NULL END) AS Num_Departures
    FROM    EmpRecords
    GROUP BY EmpDate;

Results for both are below.

EmpDate      Num_Departures
2022-01-01   2
2022-02-01   0
2022-03-01   1

Update following question in comments

A further clarification/question was put in the comments - how to modify the approach based on

  1. the second option (explicitly setting months) being used, but
  2. sometimes the date is not the first of the month.

To do so, I modified the original data above so that it was on Feb 3rd e.g.,

('20220203',    9),
('20220203',    8),
('20220203',    7),

The simplest option is to make the join to the next month based on the end of month EOMONTH function - note the join now includes EOMONTH(DATEADD(month, 1, CurrentMonth.EmpDate)) = EOMONTH(NextMonth.EmpDate)

WITH EmpRecords AS
        (SELECT     CurrentMonth.EmployeeID, CurrentMonth.EmpDate, NextMonth.EmpDate AS Later_Date
            FROM    #EmpMonths AS CurrentMonth
                    LEFT OUTER JOIN #EmpMonths AS NextMonth ON CurrentMonth.EmployeeID = NextMonth.EmployeeID
                                                            AND EOMONTH(DATEADD(month, 1, CurrentMonth.EmpDate)) = EOMONTH(NextMonth.EmpDate)
            WHERE   CurrentMonth.EmpDate < @MaxEmpDate
        )
    SELECT  EmpDate, COUNT(CASE WHEN Later_Date IS NULL THEN 1 ELSE NULL END) AS Num_Departures
    FROM    EmpRecords
    GROUP BY EmpDate;

Results for this put the set date in the output column though (note that it's now reporting the date as Feb 3).

EmpDate      Num_Departures
2022-01-01   2
2022-02-03   0
2022-03-01   1

An alternative approach is to convert all dates to the first of the month as part of processing - in the below, using DATEFROMPARTS but forcing the 'day' component to be '1'.

Note you need to recalculate the MaxEmpDate too.


SET @MaxEmpDate = (SELECT MAX(DATEFROMPARTS(YEAR(EmpDate), MONTH(EmpDate), 1)) FROM #EmpMonths);

WITH EmpMonths AS
        (SELECT     DATEFROMPARTS(YEAR(EmpDate), MONTH(EmpDate), 1) AS EmpDate_First, EmployeeID
            FROM    #EmpMonths
        ),
    EmpRecords AS
        (SELECT     CurrentMonth.EmployeeID, CurrentMonth.EmpDate_First, NextMonth.EmpDate_First AS Later_Date
            FROM    EmpMonths AS CurrentMonth
                    LEFT OUTER JOIN EmpMonths AS NextMonth ON CurrentMonth.EmployeeID = NextMonth.EmployeeID
                                                            AND DATEADD(month, 1, CurrentMonth.EmpDate_First) = NextMonth.EmpDate_First
            WHERE   CurrentMonth.EmpDate_First < @MaxEmpDate
        )
    SELECT  EmpDate_First AS EmpDate, COUNT(CASE WHEN Later_Date IS NULL THEN 1 ELSE NULL END) AS Num_Departures
    FROM    EmpRecords
    GROUP BY EmpDate_First;

The advantage of the above is that the results appear with the 1st of the month in reporting (e.g., the Feb date is now Feb 1st).

EmpDate      Num_Departures
2022-01-01   2
2022-02-01   0
2022-03-01   1
seanb
  • 6,272
  • 2
  • 4
  • 22
  • Wonderful, I'm using your second approach. I failed to mention that the next month may not be exactly a month from the current month. For example, my current month may be 2022-03-01 but my next month could be 2022-04-03. Would there be a fix around this ? – honey_badgerzz Oct 25 '22 at 14:38
  • 1
    @David - I've added two approaches in within the answer - either joining using EOMONTH or re-calculating the dates as the 1st of each month. – seanb Oct 25 '22 at 23:12
  • Worked like a charm. Thank you so much for revisiting this! – honey_badgerzz Oct 26 '22 at 14:28
2

Try the following :

select [date], SUM(ch) departures
from
(
  select [date], 
       case 
        when [date] = (select max([date]) from table_name)
          then null
        when not exists(select 1 from table_name D where D.[date]=DATEADD(month, 1, T.[date]) and D.employee_id=T.employee_id)
          then 1 
        else 0
      end as ch
  from table_name T
) A
group by [date]
order by [date] desc

See a demo.

when [date] = (select max([date]) from table_name) to select last month values as null since there is no next month.

when not exists(select 1 ... for each row in the table, checks if an employee is not existed in the next month of the current row month, if the condition is met then return 1 else return 0 (the employee is existed in current row month and next month).

ahmed
  • 9,071
  • 3
  • 9
  • 22