9

Original Question

I'm stuck on the following problem. I'm trying to figure out at which moments in time and for how long a vehicle is situated at the factory. I have an excel sheet in which all events are stored which are either delivery routes or maintenance events. The ultimate goal is to obtain a dataframe in which the vehicle registration number is given with the corresponding arrival at the factory and the time spend there(including maintenance actions). For people interested, this is because I ultimately want to be able to schedule non-critical maintenance actions on the vehicles.

An example of my dataframe would be:

  Registration RoutID       Date Dep Loc Arr Loc Dep Time Arr Time  Days
0         XC66    A58  20/May/17    Home   Loc A    10:54    21:56     0
1         XC66    A59  21/May/17   Loc A    Home    00:12    10:36     0
2         XC66   A345  21/May/17   Home    Loc B    12:41    19:16     0
3         XC66   A346  21/May/17   Loc B   Loc C    20:50    03:49     1
4         XC66   A347  22/May/17   Loc C    Home    06:10    07:40     0
5         XC66    #M1  22/May/17    Home    Home    10:51    13:00     0

I have created a script in which the dates and times are all processed to create the correct datetime columns for the arrival and departure datetimes. For the maintenance periods: "Dep Loc" = Home and "Arr Loc" = Home the following code is used to single out the relevant lines:

df_home = df[df["Dep Loc"].isin(["Home"])]
df_home = df_home[df_home["Arr Loc"].isin(["Home"])]

From here I can easily subtract the dates to create the duration column.

So far so good. However, I'm stuck on using calculating the other times. This because there might be intermediate stops, so the .shift() function does not work as the amount of rows to shift by is not-constant.

I have tried to search on this matter but I could only find shift solutions, or answers that are based in the internal event times, but not on the time between events.

Any guidance in the right direction would be greatly appreciated!

Regards

Attempt of the Solution

I have been stuck on this question for a while now, but shortly after posting this question I tried this solution:

for idx, loc in enumerate(df["Arr Loc"]):
    if loc == "Home":
        a = ((idx2, obj) for idx2, obj in enumerate(df["Dep Loc"]) if (obj == "Home" and idx2 > idx))
        idx_next = next(a)
        idx_next = idx_next[0]

        Arrival_times = df["Arr Time"]
        Departure_times = df["Dep Time"]

        Duration = Arrival_times[idx] - Departure_times[idx_next]

Here I used the next function to find the next occurrence of Home as the starting location(i.e. the time the vehicle leaves the base). Subsequently I subtract the two dates to find the proper time difference.

It works for the small data set, but not still for the entire dataset.

jeff
  • 151
  • 7
  • 2
    What is your expected output from this dataframe? – Scott Boston Jul 25 '17 at 15:47
  • Sorry if I wasn't clear. I eventually want a frame that has the columns: Registration - Arrival at Base - Time spend at Base - Action While at Base So for this specific step I want the output: Time leaving the base minus Time arrived at the base If I make it more clear this way? – jeff Jul 25 '17 at 15:49
  • Edit: For the presented frame this would be: 12:41 from line 2 minus 10:36 from line 1 and 10:51 from line 7 minus 07:40 from line 6, stored in a duration column with on the same row the registration number and the date of the initial arrival – jeff Jul 25 '17 at 15:57
  • Do you care about the intervening time steps, or only where Dep Loc / Arr Loc == 'Home', and then matching up the next arrival for each corresponding departure? – Peter Mularien Aug 16 '17 at 13:55
  • Add the date stamp with `Arr Time` and `Dep Time` to find exact differences. – Vijay Anand Pandian Sep 16 '18 at 05:43
  • Old question but FWIW I think this could have been benefited greatly from grouping by Registration and RouteID. – vmg Feb 15 '19 at 10:38
  • Is this question still relevant? if so, maybe you can explain the terms like what you mean by "base", "factory", "stop" and maybe prepare the output you'd expect for the example dataframe. – jottbe Jul 31 '19 at 18:47

1 Answers1

1

After filtering the relevant data rows, convert the "Arr time" & "Dep time" to timestamps according to the "Date" & "Days" columns

df_home = df[df["Dep Loc"].isin(["Home"])]
df_home = df_home[df_home["Arr Loc"].isin(["Home"])]

df_home['Dep Time']=df_home['Date']+' '+df_home['Dep Time'] 

df_home['Arr Time']=df_home['Date']+' '+df_home['Arr Time'] 

df_home['Date']=pd.to_datetime(df_home['Date'])

df_home['Dep Time']=pd.to_datetime(df_home['Dep Time'])
df_home['Arr Time']=pd.to_datetime(df_home['Arr Time'])
df_home['Dep Time']=pd.to_datetime(df_home['Dep Time'])+pd.to_timedelta(df_home['Days'], unit='d')

Finally, difference between "Dep time" & "Arr time" would give the time duration(in minutes)

df_home['diff_duration']=(df_home['Dep Time']-df_home['Arr Time']).astype('timedelta64[m]')
sai kumar
  • 44
  • 3