17

I have the following dataframe:

               Time   Work
2018-12-01 10:00:00     Off
2018-12-01 10:00:02     On
2018-12-01 10:00:05     On
2018-12-01 10:00:06     On
2018-12-01 10:00:07     On
2018-12-01 10:00:09    Off
2018-12-01 10:00:11    Off
2018-12-01 10:00:14     On
2018-12-01 10:00:16     On
2018-12-01 10:00:18     On
2018-12-01 10:00:20    Off

I would like to creat a new column with the elapsed time since the device started working.

               Time   Work    Elapsed Time
2018-12-01 10:00:00    Off               0
2018-12-01 10:00:02     On               2
2018-12-01 10:00:05     On               5
2018-12-01 10:00:06     On               6
2018-12-01 10:00:07     On               7
2018-12-01 10:00:09    Off               0
2018-12-01 10:00:11    Off               0
2018-12-01 10:00:14     On               3
2018-12-01 10:00:16     On               5
2018-12-01 10:00:18     On               7
2018-12-01 10:00:20    Off               0

How can I do it?

Rafael
  • 193
  • 5
  • 4
    Welcome to Stack Overflow, Rafael! I definitely came here just because the title seemed amusing, but left learning what Pandas actually means in this context. – zr00 Dec 06 '18 at 20:21

5 Answers5

14

You can use groupby:

# df['Time'] = pd.to_datetime(df['Time'], errors='coerce') # Uncomment if needed.
sec = df['Time'].dt.second
df['Elapsed Time'] = (
    sec - sec.groupby(df.Work.eq('Off').cumsum()).transform('first'))

df
                  Time Work  Elapsed Time
0  2018-12-01 10:00:00  Off             0
1  2018-12-01 10:00:02   On             2
2  2018-12-01 10:00:05   On             5
3  2018-12-01 10:00:06   On             6
4  2018-12-01 10:00:07   On             7
5  2018-12-01 10:00:09  Off             0
6  2018-12-01 10:00:11  Off             0
7  2018-12-01 10:00:14   On             3
8  2018-12-01 10:00:16   On             5
9  2018-12-01 10:00:18   On             7
10 2018-12-01 10:00:20  Off             0

The idea is to extract the seconds portion and subtract the elapsed time from the first moment the state changes from "Off" to "On". This is done using transform and first.

cumsum is used to identify groups:

df.Work.eq('Off').cumsum()

0     1
1     1
2     1
3     1
4     1
5     2
6     3
7     3
8     3
9     3
10    4
Name: Work, dtype: int64

If there's a possibility your device can span multiple minutes while in the "On", then, initialise sec as:

sec = df['Time'].values.astype(np.int64) // 10e8

df['Elapsed Time'] = (
    sec - sec.groupby(df.Work.eq('Off').cumsum()).transform('first'))

df
                  Time Work  Elapsed Time
0  2018-12-01 10:00:00  Off           0.0
1  2018-12-01 10:00:02   On           2.0
2  2018-12-01 10:00:05   On           5.0
3  2018-12-01 10:00:06   On           6.0
4  2018-12-01 10:00:07   On           7.0
5  2018-12-01 10:00:09  Off           0.0
6  2018-12-01 10:00:11  Off           0.0
7  2018-12-01 10:00:14   On           3.0
8  2018-12-01 10:00:16   On           5.0
9  2018-12-01 10:00:18   On           7.0
10 2018-12-01 10:00:20  Off           0.0
cs95
  • 379,657
  • 97
  • 704
  • 746
  • @Rafael Yeah, the assumption here is that your row starts in the "Off" condition. Can you append a row at the beginning of your frame? – cs95 Dec 06 '18 at 19:19
  • @Rafael Okay, and regarding your second point, does `df['Time'].values.astype(np.int64) // 10e8` work? – cs95 Dec 06 '18 at 19:21
  • The code worked fine for seconds. However, when the first cell of the column Work was 'On' the elapsed time did not begin in zero. Besides, when the time changed to the next minute, the elapsed time was negative. I tried using sec = df['Time'].astype(int) but I got the error: cannot astype a datetimelike from [datetime64[ns]] to [int32]; – Rafael Dec 06 '18 at 19:32
  • @Rafael can you read my comments just about yours again please? – cs95 Dec 06 '18 at 19:32
  • I deleted the comment and posted it again so I could edit it. Regarding your answers, I receive the data every day, it begins 'On' and ends 'On', so I am not sure if I can append a row, but I will try to using the date change as condition. The code df['Time'].values.astype(np.int64) // 10e8 did work. – Rafael Dec 06 '18 at 19:49
8

IIUC first with transform

(df.Time-df.Time.groupby(df.Work.eq('Off').cumsum()).transform('first')).dt.seconds
Out[1090]: 
0     0
1     2
2     5
3     6
4     7
5     0
6     0
7     3
8     5
9     7
10    0
Name: Time, dtype: int64
BENY
  • 317,841
  • 20
  • 164
  • 234
  • If I set the column Time as the index, how should I change the code so it would also work? – Rafael Dec 11 '18 at 14:53
  • @Rafael df.reset_index(inplace=True) – BENY Dec 11 '18 at 14:56
  • I added the line df.set_index('Time', inplace=True) before the code you wrote for the elapsed time. So I have to adapt the code to subtract in the index column instead of the Time column. I tried (df.index-df.index.groupby(df.Operation.eq('Off').cumsum()).transform('first')) but it did not work. – Rafael Dec 11 '18 at 15:06
  • @Rafael this is `df.reset_index(inplace=True)` reset not set – BENY Dec 11 '18 at 15:07
7

You could use two groupbys. The first calculates the time difference within each group. The second then sums those within each group.

s = (df.Work=='Off').cumsum()
df['Elapsed Time'] = df.groupby(s).Time.diff().dt.total_seconds().fillna(0).groupby(s).cumsum()

Output

                  Time Work  Elapsed Time
0  2018-12-01 10:00:00  Off           0.0
1  2018-12-01 10:00:02   On           2.0
2  2018-12-01 10:00:05   On           5.0
3  2018-12-01 10:00:06   On           6.0
4  2018-12-01 10:00:07   On           7.0
5  2018-12-01 10:00:09  Off           0.0
6  2018-12-01 10:00:11  Off           0.0
7  2018-12-01 10:00:14   On           3.0
8  2018-12-01 10:00:16   On           5.0
9  2018-12-01 10:00:18   On           7.0
10 2018-12-01 10:00:20  Off           0.0
ALollz
  • 57,915
  • 7
  • 66
  • 89
  • The code worked fine. However, when the first work cell of the dataframe was 'On', the elapsed time was not zero. – Rafael Dec 06 '18 at 19:25
  • @Rafael good point. There might be a neat way to fix it in the calculation, but you can just fix it after the fact with `df.loc[df.index < s[s==1].idxmax(), 'Elapsed Time'] = 0`. I guess there's still an issue if the machine never tuns on, but that can be fixed or handled too. – ALollz Dec 06 '18 at 19:34
4

Using a groupby, you can do this:

df['Elapsed Time'] = (df.groupby(df.Work.eq('Off').cumsum()).Time
                      .transform(lambda x: x.diff()
                                 .dt.total_seconds()
                                 .cumsum())
                      .fillna(0))

>>> df
                  Time Work  Elapsed Time
0  2018-12-01 10:00:00  Off           0.0
1  2018-12-01 10:00:02   On           2.0
2  2018-12-01 10:00:05   On           5.0
3  2018-12-01 10:00:06   On           6.0
4  2018-12-01 10:00:07   On           7.0
5  2018-12-01 10:00:09  Off           0.0
6  2018-12-01 10:00:11  Off           0.0
7  2018-12-01 10:00:14   On           3.0
8  2018-12-01 10:00:16   On           5.0
9  2018-12-01 10:00:18   On           7.0
10 2018-12-01 10:00:20  Off           0.0
sacuL
  • 49,704
  • 8
  • 81
  • 106
4

A numpy slicy approach

u, f, i = np.unique(df.Work.eq('Off').values.cumsum(), True, True)
t = df.Time.values

df['Elapsed Time'] = t - t[f[i]]
df

                  Time Work Elapsed Time
0  2018-12-01 10:00:00  Off     00:00:00
1  2018-12-01 10:00:02   On     00:00:02
2  2018-12-01 10:00:05   On     00:00:05
3  2018-12-01 10:00:06   On     00:00:06
4  2018-12-01 10:00:07   On     00:00:07
5  2018-12-01 10:00:09  Off     00:00:00
6  2018-12-01 10:00:11  Off     00:00:00
7  2018-12-01 10:00:14   On     00:00:03
8  2018-12-01 10:00:16   On     00:00:05
9  2018-12-01 10:00:18   On     00:00:07
10 2018-12-01 10:00:20  Off     00:00:00

We can nail down the integer bit with

df['Elapsed Time'] = (t - t[f[i]]).astype('timedelta64[s]').astype(int)
df

                  Time Work  Elapsed Time
0  2018-12-01 10:00:00  Off             0
1  2018-12-01 10:00:02   On             2
2  2018-12-01 10:00:05   On             5
3  2018-12-01 10:00:06   On             6
4  2018-12-01 10:00:07   On             7
5  2018-12-01 10:00:09  Off             0
6  2018-12-01 10:00:11  Off             0
7  2018-12-01 10:00:14   On             3
8  2018-12-01 10:00:16   On             5
9  2018-12-01 10:00:18   On             7
10 2018-12-01 10:00:20  Off             0
piRSquared
  • 285,575
  • 57
  • 475
  • 624