After finding the following link regarding calculating time differences using Pandas, I'm still stuck attempting to fit that knowledge to my own data. Here's what my dataset looks like:
In [10]: df
Out[10]:
id time
0 420 1/3/2018 8:32
1 420 1/3/2018 8:36
2 420 1/3/2018 8:42
3 425 1/7/2018 12:35
4 425 1/7/2018 14:29
5 425 1/7/2018 16:15
6 425 1/7/2018 16:36
7 427 1/11/2018 20:50
8 428 1/13/2018 16:35
9 428 1/13/2018 17:36
I'd like to perform a groupby or another function on ID where the output is:
In [11]: pd.groupby(df[id])
Out [11]:
id time (duration)
0 420 0:10
1 425 4:01
2 427 0:00
3 428 1:01
The types for id and time are int64 and object respectively. Using python3 and pandas 0.20.
Edit: Coming from SQL, this appears that it would be functionally equivalent to:
select id, max(time) - min(time)
from df
group by id
Edit 2: Thank you all for the quick responses. All of the solutions give me some version of the following error. Not sure what is relevant to my particular dataset that I'm missing here:
TypeError: unsupported operand type(s) for -: 'str' and 'str'