0

I have the following dataframe: df

                  Date  Type
0  1990-01-01 02:00:00   1
1  1990-01-01 03:00:00   1
2  1990-01-01 04:00:00   1
3  1990-01-01 05:00:00   2
4  1990-01-01 06:00:00   2
5  1990-01-01 07:00:00   2

How do I get the timedifference per row on a new column df['dt'] in seconds within column level df['time']? The following works (but not on column level):

df['dt'] = (df['Date'] - df['Date'].shift(1)).astype('timedelta64[s]')

How do I get this to work on column level? Ideally, the beginning of a new type should have 0s as time difference.

Jeroen
  • 801
  • 6
  • 20
  • your solution gives me the error: ValueError: Could not convert object to NumPy timedelta. The complete code is: `import pandas as pd import numpy as np data_empty = {} df = pd.DataFrame(data_empty) df['HH'] = np.arange(0,10) start_date ='1990-01-01 00:00:00' df['Date'] = pd.to_datetime(df['HH'], unit='h', origin=start_date) df['Type'] = [1,1,1,1,1,2,2,2,2,2]` – Jeroen Mar 04 '20 at 13:34
  • oops, mea culpa, not tested solution. So added answer, I hope now working correct. – jezrael Mar 04 '20 at 13:43

1 Answers1

1

For solution with shift subtract values by Series.sub with shifted data per groups by DataFrameGroupBy.shift, replace missing values to 0 Timedeta and last use Series.dt.total_seconds:

df = pd.DataFrame() 
df['HH'] = np.arange(0,10) 
start_date ='1990-01-01 00:00:00' 
df['Date'] = pd.to_datetime(df['HH'], unit='h', origin=start_date) 
df['Type'] = [1,1,1,1,1,2,2,2,2,2]

s = df.groupby("Type")['Date'].shift()
df['dt2'] = df['Date'].sub(s).fillna(pd.Timedelta(0)).dt.total_seconds()
print (df)
   HH                Date  Type     dt2
0   0 1990-01-01 00:00:00     1     0.0
1   1 1990-01-01 01:00:00     1  3600.0
2   2 1990-01-01 02:00:00     1  3600.0
3   3 1990-01-01 03:00:00     1  3600.0
4   4 1990-01-01 04:00:00     1  3600.0
5   5 1990-01-01 05:00:00     2     0.0
6   6 1990-01-01 06:00:00     2  3600.0
7   7 1990-01-01 07:00:00     2  3600.0
8   8 1990-01-01 08:00:00     2  3600.0
9   9 1990-01-01 09:00:00     2  3600.0

Another idea with DataFrameGroupBy.diff:

df['dt2'] = df.groupby("Type")['Date'].diff().fillna(pd.Timedelta(0)).dt.total_seconds()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252