Transforming a categorical column into columns in python

Question

I am trying to calculate the time period in seconds that cars were not available. I have the following table:

╔═════════════════════╦═══════════╦══════╦═════════════╗
║ statusDateTime      ║ shift     ║ car  ║ isAvaliable ║
║ 2019-04-02 02:58:39 ║ 190402001 ║ E077 ║ 1           ║
║ 2019-04-02 13:17:58 ║ 190402002 ║ E077 ║ 0           ║
║ 2019-04-02 13:35:10 ║ 190402002 ║ E077 ║ 1           ║
╚═════════════════════╩═══════════╩══════╩═════════════╝

To solving this I want to divide the last binary column (isAvaliable) into isAvaliable_0 and isAvaliable_1 where I will write the time from the statusDateTime column. As a result there is should be the following table:

╔═══════════╦══════╦═════════════════════╦═════════════════════╗
║ Shift     ║ Car  ║ isAvaliable_00      ║ isAvaliable_01      ║
║ 190402001 ║ E077 ║ 2019-04-02 02:58:39 ║ 2019-04-02 13:17:58 ║
║ 190402002 ║ E077 ║ 2019-04-02 13:35:10 ║ 2019-04-02 14:35:10 ║
╚═══════════╩══════╩═════════════════════╩═════════════════════╝

Is there some elegant way to doing this in python's pandas? Thanks!

score 0 · Answer 1 · answered Feb 27 '20 at 06:52

You can use the following function to get seconds between two dates

import numpy as np

def seconds_between(datestart, dateend):
    "Return seconds between start and end date"
    diff = dateend - datestart
    return diff / np.timedelta64(1, 's')

df['diff'] = seconds_between(df['statusDateTime'].shift(-1),df['statusDateTime'])

Now all you have to do is to make a selection on the two dates that you want to find the difference between using your logic.

score 0 · Answer 2 · answered Feb 27 '20 at 08:20

Taking a simplistic SQL-like approach works quite well.

Segregate the rows into two dataframes, one where isAvaliable=0, and the other where isAvaliable=1.
Outer merge these dataframes. Outer merge will account for missing status of 'isAvaliable'.

df=pd.DataFrame([['2019-04-02 02:58:39','190402001','E077',1],
                 ['2019-04-02 13:17:58','190402001','E077',0],
                 ['2019-04-02 13:35:10','190402002','E077',0],
                 ['2019-04-02 14:35:10','190402002','E077',1]
                ],columns=['statusDateTime','shift','car','isAvaliable'])
df0=df[df['isAvaliable']==0].drop('isAvaliable',axis=1)
df1=df[df['isAvaliable']==1].drop('isAvaliable',axis=1)
new=df1.merge(df0,on=['shift','car'],suffixes=['_1','_0'],how='outer')

This yields:

statusDateTime_1     shift        car    statusDateTime_0
2019-04-02 02:58:39  190402001    E077   2019-04-02 13:17:58
2019-04-02 14:35:10  190402002    E077   2019-04-02 13:35:10

I am assuming that there is a typo in your question. '2019-04-02 13:17:58' should be in shift '190402001'

Transforming a categorical column into columns in python

2 Answers2