1

After finding the following link regarding calculating time differences using Pandas, I'm still stuck attempting to fit that knowledge to my own data. Here's what my dataset looks like:

In [10]: df 
Out[10]:
      id           time
   0  420 1/3/2018 8:32
   1  420 1/3/2018 8:36
   2  420 1/3/2018 8:42
   3  425 1/7/2018 12:35
   4  425 1/7/2018 14:29
   5  425 1/7/2018 16:15
   6  425 1/7/2018 16:36
   7  427 1/11/2018 20:50
   8  428 1/13/2018 16:35
   9  428 1/13/2018 17:36

I'd like to perform a groupby or another function on ID where the output is:

In [11]: pd.groupby(df[id])
Out [11]:

      id   time (duration)
   0  420  0:10
   1  425  4:01
   2  427  0:00
   3  428  1:01

The types for id and time are int64 and object respectively. Using python3 and pandas 0.20.

Edit: Coming from SQL, this appears that it would be functionally equivalent to:

select id, max(time) - min(time)
from df
group by id

Edit 2: Thank you all for the quick responses. All of the solutions give me some version of the following error. Not sure what is relevant to my particular dataset that I'm missing here:

TypeError: unsupported operand type(s) for -: 'str' and 'str'
Sepa
  • 111
  • 1
  • 9

3 Answers3

3

groupby with np.ptp

df.groupby('id').time.apply(np.ptp)

id
420   00:10:00
425   04:01:00
427   00:00:00
428   01:01:00
Name: time, dtype: timedelta64[ns]
user3483203
  • 50,081
  • 9
  • 65
  • 94
1

Group the dataframe by event IDs and select the smallest and the largest times:

df1 = df.groupby('id').agg([max, min])

Find the difference:

(df1[('time','max')] - df1[('time','min')]).reset_index()
#    id          0
#0  420   00:10:00
#1  425   04:01:00
#2  427   00:00:00
#3  428   01:01:00
DYZ
  • 55,249
  • 10
  • 64
  • 93
0

You need to sort the dataframe by time and group by id before getting the difference between time in each group.

df['time'] = pd.to_datetime(df['time'])
df.sort_values(by='time').groupby('id')['time'].apply(lambda g: g.max() - g.min()).reset_index(name='duration')

Output:

     id duration
0   420 00:10:00
1   425 04:01:00
2   427 00:00:00
3   428 01:01:00
harvpan
  • 8,571
  • 2
  • 18
  • 36