0

I have a dataframe with the following infos:

         Departure Time  Offset Dep  Arrival Time   Offset Arr
0          07:10           +01:00        08:25         +01:00
1          09:05           +01:00        10:10         +01:00
2          10:50           +01:00        12:05         +01:00
3          11:55           +01:00        14:15         +00:00
4          14:55           +02:00        18:40         +01:00


df.dtypes

Departure Time      object
Offset Departure    object
Arrival Time        object
Offset Arrival      object
dtype: object

I would like to calculate the time duration: Arrival Time + Offset Arr - Departure Time - Offset Dep

I first tried to convert all of them to the time format but I could only do this with the actual times, not the time offsets:

df["Arrival Time"] = pd.to_datetime(df ["Arrival Time"]).dt.time
df["Departure Time"] = pd.to_datetime(df ["Departure Time"]).dt.time

So my problem is on the one side to conver the offset columns to a format I can use for time calculation and then how to effectively calculate the time duration.

As I want to use the time duration for a data science calculation (Gradient Boosting), it would be great if you could suggest a duration format that can be plugged into the algorithm right away.

Huebschi
  • 59
  • 6

1 Answers1

1

You can try the below method:

import pandas as pd
import datetime

#date time 
df["Departure Time"] = pd.to_datetime(df["Departure Time"])
df["Arrival Time"] = pd.to_datetime(df["Arrival Time"])

#time delta
df["Offset Dep"]=pd.to_timedelta(df["Offset Dep"], unit='hour')
df["Offset Arr"]=pd.to_timedelta(df["Offset Arr"], unit='hour')


df["Time Duration"]= df["Arrival Time"] + df["Offset Dep"] - df["Departure Time"] - df["Offset Dep"]

You convert your offset columns to time delta's and then you can add it to your date-time columns.

Nicole Douglas
  • 579
  • 4
  • 14