1

For a given data frame df

timestamps = [
    datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 1
    datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 2
    datetime.datetime(2018, 1, 1, 11, 0, 0, 0), # person 2
    datetime.datetime(2018, 1, 2, 11, 0, 0, 0), # person 2
    datetime.datetime(2018, 1, 1, 10, 0, 0, 0), # person 3
    datetime.datetime(2018, 1, 2, 11, 0, 0, 0), # person 3
    datetime.datetime(2018, 1, 4, 10, 0, 0, 0), # person 3
    datetime.datetime(2018, 1, 5, 12, 0, 0, 0)  # person 3
]
df = pd.DataFrame({'person': [1, 2, 2, 2, 3, 3, 3, 3], 'timestamp': timestamps })

I want to calculate for each person (df.groupby('person')) the time differences between all timestamps of that person, which I would to with diff().

df.groupby('person').timestamp.diff()

is just half the way, because the mapping back to the person is lost.

How could a solution look like?

jpp
  • 159,742
  • 34
  • 281
  • 339
Michael Dorner
  • 17,587
  • 13
  • 87
  • 117
  • No, not really a duplicate. I missed that point that I need `transform` for the `diff`. That is the answer to my question. – Michael Dorner Mar 28 '18 at 09:00
  • Have you *actually tried* the above dup? It works fine.. – jpp Mar 28 '18 at 09:02
  • Yes, as you can see in my question, I did exactly this (using `diff()`), but this was not the proper solution. As @jezreal pointed out I missed that diff does not aggregate values, but transform does. – Michael Dorner Mar 28 '18 at 11:28
  • I think this is a case of user error. If you apply logic *exactly as described in the post*, it works. I know, because I checked. I'll post an example here. – jpp Mar 28 '18 at 11:29
  • 1
    My apologies, indeed, it was my error! Sorry about that! – Michael Dorner Mar 28 '18 at 11:32

2 Answers2

2

i think you should use

df.groupby('person').timestamp.transform(pd.Series.diff)

thelastworm
  • 544
  • 4
  • 14
1

There is problem diff no aggregate values, so possible solution is transform:

df['new'] = df.groupby('person').timestamp.transform(pd.Series.diff)
print (df)
   person           timestamp             new
0       1 2018-01-01 10:00:00             NaT
1       2 2018-01-01 10:00:00             NaT
2       2 2018-01-01 11:00:00 0 days 01:00:00
3       2 2018-01-02 11:00:00 1 days 00:00:00
4       3 2018-01-01 10:00:00             NaT
5       3 2018-01-02 11:00:00 1 days 01:00:00
6       3 2018-01-04 10:00:00 1 days 23:00:00
7       3 2018-01-05 12:00:00 1 days 02:00:00
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252