0

I have a dataframe with the following infos:

         Departure Time  Offset Dep  Arrival Time   Offset Arr
0          05:10           +01:00        08:25         +01:00
1          08:05           +01:00        10:10         +01:00
2          11:50           +01:00        12:05         +01:00
3          11:55           +01:00        14:15         +00:00
4          14:55           +02:00        18:40         +01:00


df.dtypes

Departure Time      object
Offset Departure    object
Arrival Time        object
Offset Arrival      object
dtype: object

I would like to create another column for Departure time that includes the Departure Offset as offset in the datetime format.

I tried this but without success

df['Departure_Time'] = df['Departure Time'] + df['Offset Departure'].astype('timedelta64')

ValueError: Could not convert object to NumPy timedelta
Huebschi
  • 59
  • 6

2 Answers2

0

Pay attention to this link: https://github.com/numpy/numpy/issues/4440 I think should at the first define an object for Timedelta with timedelta64 and then use it.

0

Sample df:

  Departure Time Offset Dep Arrival Time Offset Arr
0          05:10     +01:00        08:25     +01:00
1          08:05     +01:00        10:10     +01:00
2          11:50     +01:00        12:05     +01:00
3          11:55     +01:00        14:15     +00:00
4          14:55     +02:00        18:40     +01:00
5          13:15     -03:00        20:20     -02:00

You can first convert the 'Offset Dep' to timedelta using pd.to_timedelta, but you'll need to add in the :ss portion of the string.:

pd.to_timedelta(df['Offset Dep'].astype(str) + ':00')

Then, you can convert the 'Departure Time' to datetime and add the time delta:

>>> df['Departure Time'].astype('datetime64') + pd.to_timedelta(df['Offset Dep'].astype(str) + ':00')
0   2020-04-24 06:10:00
1   2020-04-24 09:05:00
2   2020-04-24 12:50:00
3   2020-04-24 12:55:00
4   2020-04-24 16:55:00
5   2020-04-24 10:15:00
dtype: datetime64[ns]

If you have any nan values you want to exclude, e.g.:

  Departure Time Offset Dep Arrival Time Offset Arr
0          05:10     +01:00        08:25     +01:00
1          08:05     +01:00        10:10     +01:00
2          11:50     +01:00        12:05     +01:00
3          11:55     +01:00        14:15     +00:00
4          14:55     +02:00        18:40     +01:00
5            NaN        NaN          NaN        NaN
6          13:15     -03:00        20:20     -02:00

You might want to exclude the NaN rows with a mask:

mask = df['Offset Dep'].notna()
df.loc[mask, 'Departure Time'] = (df.loc[mask, 'Departure Time'].astype('datetime64') \
                 + pd.to_timedelta(df.loc[mask, 'Offset Dep'].astype(str) + ':00')).dt.strftime('%H:%M')

Result:

  Departure Time Offset Dep Arrival Time Offset Arr
0          06:10     +01:00        08:25     +01:00
1          09:05     +01:00        10:10     +01:00
2          12:50     +01:00        12:05     +01:00
3          12:55     +01:00        14:15     +00:00
4          16:55     +02:00        18:40     +01:00
5            NaN        NaN          NaN        NaN
6          10:15     -03:00        20:20     -02:00
r.ook
  • 13,466
  • 2
  • 22
  • 39
  • Thank you for your answer. I get this error: ValueError: expecting hh:mm:ss format, received: nan:00 – Huebschi Apr 24 '20 at 20:15
  • Ah, you have some `nan` in your column then. For those rows how does your `Departure Time` look like? `nan` also? The approach is either you can default these `Offset` to `00:00:00` (i.e. no change, if `Departure Time` exists`), or exclude these rows in your process altogether. – r.ook Apr 24 '20 at 20:17
  • Please see edit. Was a bit more complicated than I imagined due to type casting. – r.ook Apr 24 '20 at 20:49