-1

Long time lurker, first time asking a question. If I make any etiquette mistakes, please correct me.

Goal: Get the time an individual started and ended a position based on their job code and team code. So if someone left a position, then later returned to the position I would like it to give me two "Start Dates" in that position.


Data Table / Format: My data table includes a weekly snapshot taken every Sunday of every employee, their employeeID, their job code, their team code, and then other stuff. Here's a (fake) example. Variable types in ().

 EID    JobCode   TeamCode   weekstartdate
(INT)  (VARCHAR)   (INT)      (DATETIME)
1     INT111      142        2022-02-20 00:00:00.00
1     INT111      142        2022-02-27 00:00:00.00

I have tried something along the lines of

Select EID,Jobcode,teamcode,MIN(weekstartdate)
from #Dataset
group by EID,Jobcode,Teamcode

The struggle is that if someone has left and returned to the job, it would only show their original weekstartdate. Where I need only uninterrupted time in the position.

If anyone can point me towards functions or features I don't know about that could help with this, I would appreciate it.

Charlieface
  • 52,284
  • 6
  • 19
  • 43
  • I think I understand what you are looking for but not entirely. Can you provide a little bit more sample data that demonstrates the whole issue? – Sean Lange Jul 27 '22 at 17:44
  • I'm guessing you are looking for at Gaps-and-Islands solution, but the sample data is a bit thin. Hard to determine what a GAP is. (1 week / 7 days ? ) – John Cappelletti Jul 27 '22 at 17:51
  • Does [this](https://stackoverflow.com/questions/63666532/group-islands-of-contiguous-dates-including-missing-weekends) help? – Sean Lange Jul 27 '22 at 17:57
  • I think you are saying that your table is a series of snapshots with weekly granularity. Can we assume that if an employee leaves, then returns in the same week, you don't need to capture that? Because the existing model provides no way to do that. – allmhuran Jul 27 '22 at 18:06
  • "If I make any etiquette mistakes, please correct me" I would consider "Not Responding" and faux pas. – John Cappelletti Jul 27 '22 at 18:15
  • On the other hand, you don't have to answer me immediately. People go to the shower, take lunches, drive cars, have meetings, and do other things that can easily take time and might mean they cannot sit and watch their question constantly. It would be rude of me to have such an expectation. – allmhuran Jul 27 '22 at 18:24
  • Hey! Thank you to all the comments. I checked back after about 15-30 minutes, then got focused on something else. To respond to comments: I apologize for the thin data set. I will try to edit to provide more data. Gaps-and-Islands Yes! I was unfamiliar with that term, but that's exactly what this is - looking for the start of each 'Island'. A Gap in this case would be 1 or more sundays they do not appear in the role. (ie 2 weeks weeks or longer between sundays in the role). – SuperMF94 Jul 27 '22 at 20:22
  • If an employee were to leave and return within the same week, I think I'd be fine skipping over that. – SuperMF94 Jul 27 '22 at 20:23
  • And yes, a series of snapshots with weekly granularity. – SuperMF94 Jul 27 '22 at 20:23

2 Answers2

1

This seems to be a type of gaps and islands problem - with some assumptions that you should validate (a row for Sunday only).

Following is one example that is a bit overcomplicated but I started with pre-existing code. ROW_NUMBER is not required here but I use/include it for debugging and learning purposes. You use LAG to get the previous start date in each row and then simply test whether that value is seven days prior to the start date column. If it is, then the row is a "continuation" of the prior row.

with cte as (
   select *, 
         row_number() over (partition by EID, JobCode, TeamCode order by weekstartdate) as rno, 
         lag (weekstartdate) over (partition by EID, JobCode, TeamCode order by weekstartdate) as prevstart 
   from empdata
), 
blocks as (
   select *, 
          case when dateadd(day, 7, prevstart) = weekstartdate then 0 else 1 end as islandstart 
   from cte
)
select * from blocks 
where islandstart = 1 
order by EID, JobCode, TeamCode, rno
;

Remove the WHERE clause to see what values are generated for all rows to help understand how it works. fiddle to demonstrate.

SMor
  • 2,830
  • 4
  • 11
  • 14
  • A coworker actually was suggesting something like this to me, but with lead rather than lag. I'm going to try implementing it in a bit. Thank you for this! And your notes on the row_number for debugging! I'll update once I get a chance to build this out! – SuperMF94 Jul 27 '22 at 20:14
  • Okay, I just implemented this solution and it appears to be working fantastically! I need to QC it with a few examples, but thank you. You are a gentleman and a scholar! (Or gentlewoman and a scholar) – SuperMF94 Jul 27 '22 at 20:52
0

I'm sure there's a much simpler way to do this, but food for thought in any case

http://sqlfiddle.com/#!18/05e76/35

create table your_table 
  ( eid varchar(10),
    jobCode varchar(10),
    teamCode integer,
    weekstartdate date
  );

insert into your_table values 
  ('1','INT111',142,'2020-01-01'),
  ('1','INT111',142,'2020-01-29'), 
  ('2','INT111',142,'2020-01-01'),
  ('2','INT111',142,'2020-01-08'),
  ('2','INT111',142,'2020-01-15'),
  ('2','INT111',142,'2020-01-22'),
  ('2','INT111',142,'2020-01-29')

with all_week_starts as
  ( select distinct 
           weekstartdate
      from your_table
  ),
  all_combinations as 
  ( select distinct 
           eid,
           jobCode,
           teamCode,
           aws.weekstartdate
      from your_table
     cross
      join all_week_starts aws
  )
select eid,
       jobCode,
       TeamCode,
       weekstartdate as first_week_employed
  from (
        select eid,
               jobCode,
               TeamCode,
               employed_this_week,
               lag(employed_this_week) over (partition by 
                             eid,
                             jobCode,
                             TeamCode
                   order by weekstartdate) as employed_last_week,
               weekstartdate
          from (
                select ac.eid,
                       ac.jobCode,
                       ac.TeamCode,
                       ac.weekstartdate,
                       case when yt.weekstartdate is not null then 1 else 0 end as employed_this_week
                  from all_combinations ac
                  left
                  join your_table yt
                    on ac.weekstartdate = yt.weekstartdate
                   and ac.eid = yt.eid
                   and ac.jobCode = yt.jobCode
                   and ac.teamCode = yt.teamCode
               ) as flagged_employment_gaps
        ) as tmp2
 where employed_this_week = 1
   and coalesce(employed_last_week,0) = 0
order
   by eid,
      jobCode,
      teamCode,
      first_week_employed
Error_2646
  • 2,555
  • 1
  • 10
  • 22
  • I'll definitely take a look into this! Never bad to have multiple ways to solve any problem. I appreciate the feedback! – SuperMF94 Jul 27 '22 at 20:15
  • @SuperMF94 Sure thing! Another bit of food for thought, one reason I wrote it like this over the more straightfoward date_add(+7) is that I've been burned in the past by naive date boundaries. Things like holidays, leap years, accounting periods, yada yada where date intervals aren't standard can be a nuisance. – Error_2646 Jul 28 '22 at 14:23