0

I have a pandas data frame with employee start time and end time. I want to know how many hours an employee has worked in a given shift (Shift1: 8:00am-2:00pm; shift2: 2pm-10pm, and shift3: 10pm-8am). Your help is appreciated.

         Start        End
0 2015-01-01 18:44:00 2015-01-02 07:31:00
1 2015-01-01 06:38:00 2015-01-01 19:57:00
2 2015-01-01 06:34:00 2015-01-01 19:13:00
3 2015-01-01 18:48:00 2015-01-02 07:15:00
4 2015-01-01 06:50:00 2015-01-01 20:02:00
Bonifacio2
  • 3,405
  • 6
  • 34
  • 54
despy
  • 1
  • What is your expected output? – Zero Jul 05 '17 at 16:42
  • 1
    `df.diff(axis=1)` – Paul H Jul 05 '17 at 16:45
  • in the meantime of showing us your expected output, Check [this link](https://stackoverflow.com/questions/39370879/extract-hour-from-timestamp-with-python) on getting the hours extracted. do you want to take minutes/seconds into play? – MattR Jul 05 '17 at 16:45
  • I am expecting three additional columns representing each shift, with the number of hours an employee has worked in each shift – despy Jul 05 '17 at 17:09

1 Answers1

0

Note that my answer is not quite polish yet. First, I create example dataset as in question.

import pandas as pd

df = pd.DataFrame([
    ['2015-01-01 18:44:00', '2015-01-02 07:31:00'], 
    ['2015-01-01 06:38:00', '2015-01-01 19:57:00'],
    ['2015-01-01 06:34:00', '2015-01-01 19:13:00'],
    ['2015-01-01 18:48:00', '2015-01-02 07:15:00'],
    ['2015-01-01 06:50:00', '2015-01-01 20:02:00']
], columns=['start', 'stop'])

df.start = pd.to_datetime(df.start)
df.stop = pd.to_datetime(df.stop)

Then find time of work between given each shift interval

from datetime import datetime, timedelta


def find_interval(r):
    """
    r: row of dataframe, with 'start' and 'stop' column
    """
    t_start = r['start']
    t_stop = r['stop']
    t = t_start
    s1_start = datetime(t.date().year, t.date().month, t.date().day, 8)
    s1_stop = datetime(t.date().year, t.date().month, t.date().day, 14)
    s2_start = datetime(t.date().year, t.date().month, t.date().day, 14)
    s2_stop = datetime(t.date().year, t.date().month, t.date().day, 22)
    s3_start = datetime(t.date().year, t.date().month, t.date().day, 22)
    s3_stop = datetime(t.date().year, t.date().month, t.date().day + 1, 8)

    shift_hours = []
    for (s_start, s_stop) in [(s1_start, s1_stop), (s2_start, s2_stop), (s3_start, s3_stop)]:
        if t_stop < s_start:
            shift_hours.append(timedelta(seconds=0))
        elif t_stop > s_start and t_stop < s_stop:
            shift_hours.append(t_stop - s_start)
        elif t_start < s_stop and t_stop > s_stop:
            shift_hours.append(s_stop - t_start)
        else:
            shift_hours.append(timedelta(seconds=0))
    return shift_hours

Concatenate back

df_shift = pd.DataFrame([find_interval(r) for _, r in df.iterrows()])
df_out = pd.concat((df, df_shift), axis=1) # output
titipata
  • 5,321
  • 3
  • 35
  • 59