2

I have a dataframe like below:

data={'time':['2021-01-01 22:00:12','2021-01-05 22:49:12','2021-01-06 21:00:00','2021-01-06 23:59:15','2021-01-07 05:00:55','2021-01-07 12:00:39'],
    'flag':['On','Off','On','Off','On','Off']}
df=pd.DataFrame(data)

I want to get difference between consecutive rows, which I accomplished using:

df['diff']=pd.to_datetime(df['time'])-pd.to_datetime(df['time'].shift(1))

But there is calculation overhead here as there is no meaning for difference for every consecutive rows, I only want the difference whenever the flag goes to Off. Also, how to convert the difference into hours ?

enter image description here

2 Answers2

1

You can create virtual groups whenever the "On" flag is encountered and then calculate the diff. Or you can calculate the diff for the whole dataframe like you did and hide the values where the flag is "On":

# convert time column as datetime64
df['time'] = pd.to_datetime(df['time'])

# create virtual groups with consecutive rows
df['diff'] = df.groupby(df['flag'].eq('On').cumsum())['time'].diff()
# OR
df['diff'] = df['time'].diff().mask(df['flag'] == 'On')

Output:

>>> df
                 time flag            diff
0 2021-01-01 22:00:12   On             NaT
1 2021-01-05 22:49:12  Off 4 days 00:49:00
2 2021-01-06 21:00:00   On             NaT
3 2021-01-06 23:59:15  Off 0 days 02:59:15
4 2021-01-07 05:00:55   On             NaT
5 2021-01-07 12:00:39  Off 0 days 06:59:44
Corralien
  • 109,409
  • 8
  • 28
  • 52
0

Mask the difference when the flag goes off

df['time'] = pd.to_datetime(df['time'])

mask = df['flag'].eq('Off') & df['flag'].shift().eq('On')
df['diff'] = df['time'].sub(df['time'].shift()).where(mask).dt.total_seconds() / 3600

                 time flag       diff
0 2021-01-01 22:00:12   On        NaN
1 2021-01-05 22:49:12  Off  96.816667
2 2021-01-06 21:00:00   On        NaN
3 2021-01-06 23:59:15  Off   2.987500
4 2021-01-07 05:00:55   On        NaN
5 2021-01-07 12:00:39  Off   6.995556
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53