For a given data frame df
timestamps = [
datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 1
datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 2
datetime.datetime(2018, 1, 1, 11, 0, 0, 0), # person 2
datetime.datetime(2018, 1, 2, 11, 0, 0, 0), # person 2
datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 3
datetime.datetime(2018, 1, 2, 11, 0, 0, 0), # person 3
datetime.datetime(2018, 1, 4, 10, 0, 0, 0), # person 3
datetime.datetime(2018, 1, 5, 12, 0, 0, 0) # person 3
]
df = pd.DataFrame({'person': [1, 2, 2, 2, 3, 3, 3, 3], 'timestamp': timestamps })
I want to calculate for each person (df.groupby('person')
) the time differences between all timestamps of that person, which I would to with diff()
.
df.groupby('person').timestamp.diff()
is just half the way, because the mapping back to the person is lost.
How could a solution look like?