-1

So, i have the following data frame on pandas:

enter image description here

I need to create another data frame with the average delivery time by day, by i'm having a hardtime with this task.

So, as you can see, I have all the 365 days of 2018 and I need to calculate what was the average time the user had to wait before getting his order delivered. I only need the hours, minutes and seconds, because the number of days will never be grater than 0. And the delivery time is timedelta64 format.

Thank you folks!

2 Answers2

0

User groupby and mean

import pandas as pd
from io import StringIO

#Data preprocessing(ignore)
data = StringIO('''
2018-01-01 ,0 days 00:58:26 
2018-01-01 ,0 days 01:27:04 
2018-01-01 ,0 days 00:17:27 
2018-01-01 ,0 days 00:14:26 
2018-01-01 ,0 days 01:08:33 
''')

#Converting to datetime and timedelta object
df = pd.read_csv(data,names=['date','delivery_time'],parse_dates=['date'])
df['delivery_time'] = pd.to_timedelta(df['delivery_time'])

#Grouping by date and then finding mean of delivery time
df.groupby(['date']).mean(numeric_only=False)

Output:

            delivery_time
date    
2018-01-01  0 days 00:49:11.200000

using dt.components you can easily extract hours minutes and seconds from your timedelta column. Something like following should generate your result.

df.groupby(['date']).mean(numeric_only=False)['delivery_time'].dt.components.iloc[:, 1:4]
k33da_the_bug
  • 812
  • 8
  • 16
0

This will get you the average delivery time (in timedelta[ns]) per day in a new dataframe:

  1. Convert the total delivery time to total seconds with numpy's timedelta64[ns]
import pandas as pd
import numpy as np

df['del_time_sec'] = df[:8]['delivery_time'] / np.timedelta64(1, 's')
  1. Get the average time, using GroupBy.transform, per day using pandas.Series.dt.day
df['avg_sec'] = (df.groupby(df['date'].dt.day)['del_time_sec'].transform('mean'))
  1. Convert back to datetime, and get your answer in a new dataframe:
df['AVG_del_time_perday'] = pd.to_timedelta(df['avg_sec'], unit='s')
res = df[['date','AVG_del_time_perday']].drop_duplicates()

res
Out[116]: 
        date       AVG_del_time_perday
0 2018-01-01    0 days 00:49:11.200000
5 2018-12-31 0 days 00:24:16.666666667
sophocles
  • 13,593
  • 3
  • 14
  • 33