2

Anyone can help with this one please? Our attendance system generates the following data:

User    Department  Date        Time        Reader
A1      IT          1/3/2014    11:12:00    1
B1      IT          1/3/2014    12:28:06    1
B1      IT          1/3/2014    12:28:07    1
A1      IT          1/3/2014    13:12:00    2
B1      IT          1/3/2014    13:28:06    2
A1      IT          2/3/2014    07:42:15    1
A1      IT          2/3/2014    16:16:15    2

Where Reader value,

  • 1 = Entry
  • 2 = Exit

I'm looking for SQL query to run on MS SQL 2005 that summarize attendance time for each employee on monthly basis, for instance

User    Department  Month   Time
A1      IT          3/2014  10.34
B1      IT          3/2014  01:00
mellamokb
  • 56,094
  • 12
  • 110
  • 136
user3495680
  • 21
  • 1
  • 4

1 Answers1

3

This is a fairly difficult problem to solve with SQL due to the need to find transitions and ranges in the data, which is not trivial. I've broken the problem down into a series of steps made of successive cte's that build on one another and lead up to a final working solution:

First, I add a row index to the data to provide a simple PK for identifying a unique row:

with NumberedAtt as (
  select
    row_number() over (partition by [user] order by date, time, reader) as ix,
    att.[user],
    att.[department],
    att.[date] + att.[time] as dt,
    att.[reader]
  from att
)

Then I grab the first and last index value per user which will be used for the outermost boundaries of each entry/exit range:

, MinMax as (
  select [user], min(ix) ixMin, max(ix) ixMax
  from NumberedAtt N group by [user]
)

Next I put these together to generate a list of all exit/entry ranges, which are the points where the value of Reader changes from 2 to 1. These are the specific points that exactly identify when a previous time range ends, and the next time range begins (and cleanly handles successive duplicate entry/exit reads). By combining this with the first entry and last exit time for each user, a list of all entry/exit transitions is generated:

, Transitions as (
  select N.[User], 0 as exitIx, M.ixMin as entryIx
  from NumberedAtt N
  join MinMax M on N.[User] = M.[User]
  where N.ix = M.ixMin

  union

  select N.[User], M.ixMax as exitIx, 0 as entryIx
  from NumberedAtt N
  join MinMax M on N.[User] = M.[User]
  where N.ix = M.ixMax

  union

  select A1.[User], A1.ix as exitIx, A2.ix as entryIx
  from NumberedAtt A1
  join NumberedAtt A2 on A1.ix + 1 = A2.ix and A1.[user] = A2.[user]
  where A1.[reader] = 2 and A2.[reader] = 1
)

Here is the output at this point:

| USER | EXITIX | ENTRYIX |
|------|--------|---------|
|   A1 |      0 |       1 |
|   A1 |      2 |       3 |
|   A1 |      4 |       0 |
|   B1 |      0 |       1 |
|   B1 |      3 |       0 |

Notice that we've neatly captured all of the row indexes where a range of time begins and ends. However, they are offset by one - that is the entry time in one row corresponds to the exit time in the next row. So we need one more transformation to bring the ranges back together by adding a row index to this table and joining each row with the following row:

, NumberedTransitions as (
  select
    row_number() over (partition by [User] order by exitIx) tix,
    T.*
  from Transitions T
), EntryExit as (
  select
    aEntry.ix as ixEntry,
    aExit.ix as ixExit,
    aEntry.[user],
    aEntry.[department],
    aEntry.[dt] as entryDT,
    aExit.[dt] as exitDT
  from NumberedTransitions tEntry
  join NumberedAtt aEntry on tEntry.entryIx = aEntry.ix and tEntry.[user] = aEntry.[user]
  join NumberedTransitions tExit on tEntry.tix + 1 = tExit.tix and tEntry.[user] = tExit.[user]
  join NumberedAtt aExit on tExit.exitIx = aExit.ix and tExit.[user] = aExit.[user]
)

After joining the successive ranges together, I also pull in the original detail data back in, since I've been working only with the row index values so far. At the end of this subquery, we have identified all the entry/exit ranges per user and "swallowed up" any multiple reads:

| IXENTRY | IXEXIT | USER | DEPARTMENT |                      ENTRYDT |                       EXITDT |
|---------|--------|------|------------|------------------------------|------------------------------|
|       1 |      2 |   A1 |         IT | March, 01 2014 11:12:00+0000 | March, 01 2014 13:12:00+0000 |
|       3 |      4 |   A1 |         IT | March, 02 2014 07:42:15+0000 | March, 02 2014 16:16:15+0000 |
|       1 |      3 |   B1 |         IT | March, 01 2014 12:28:06+0000 | March, 01 2014 13:28:06+0000 |

Now the only thing left to do is group the data together to report on total hours per user, per month. It is a little tricky to calculate the total hours, but it can be done by taking the sum of minutes between the ranges and then converting the result back into a time value:

, Hours as (
  select
    [User],
    [Department],
    Year(EntryDT) Year,
    Month(EntryDT) Month,
    RIGHT('0' + CAST(SUM(DATEDIFF(Minute, EntryDT, ExitDT)) / 60 as varchar(10)), 2) + ':' +
      RIGHT('0' + CAST(SUM(DATEDIFF(Minute, EntryDT, ExitDT)) % 60 as varchar(2)), 2) as TotalHours
  from EntryExit EE
  group by [User], [Department], Year(EntryDT), Month(EntryDT)
)

This gives a final result which is very close to the desired result:

| USER | DEPARTMENT | YEAR | MONTH | TOTALHOURS |
|------|------------|------|-------|------------|
|   A1 |         IT | 2014 |     3 |   10:34:00 |
|   B1 |         IT | 2014 |     3 |   01:00:00 |

A few tweaks could be made to the formatting as desired, but that should be easy to build on top of this framework.

Here is a working demo: http://www.sqlfiddle.com/#!3/f3f37/7

mellamokb
  • 56,094
  • 12
  • 110
  • 136
  • Thanks Mellamokb, appreciated. Is there a simple way to count (additional column) total hours based on first IXENTRY and last IXEXIT; with other words it will ignore exits/entries during same day? – user3495680 Apr 09 '14 at 18:36
  • In case anybody else is looking for a general solution, there are some edge cases that this answer does not take into account. If somebody is working over the change of the month, such as a night shift, then that time is entirely attributed to the month when they entered, even though some hours were worked in the following month. – Eldritch Cheese Sep 28 '20 at 14:21