2

So, let's say I have a data frame like this.

df = pd.DataFrame({'person':['A', 'A', 'B', 'B', 'A'],
                   'datetime':['2018-02-26 10:49:32', '2018-02-26 10:58:03', '2018-02-26 10:51:10','2018-02-26 10:58:45', '2018-02-26 10:43:34'], 
                   'location':['a', 'b', 'c', 'd', 'e']})

That shows

person  datetime    location
    A   2018-02-26 10:49:32 a
    A   2018-02-26 10:58:03 b
    B   2018-02-26 10:51:10 c
    B   2018-02-26 10:58:45 d
    A   2018-02-26 10:43:34 e

Then I sorted them based on each person and time

df.sort_values(by=['person', 'datetime'])

Which would sort the movement of each person then by their time.

person  datetime    location
4   A   2018-02-26 10:43:34 e
0   A   2018-02-26 10:49:32 a
1   A   2018-02-26 10:58:03 b
2   B   2018-02-26 10:51:10 c
3   B   2018-02-26 10:58:45 d

Which can be read as person A goes from place e, then goes to a, then goes to b. Meanwhile person B goes from place c then to place d.

I want to create a dataframe which tracks each person's movement, like this.

| person | prev_datetime       | prev_loc | next_datetime       | next_loc |
|--------|---------------------|----------|---------------------|----------|
| A      | 2018-02-26 10:43:34 | e        | 2018-02-26 10:49:32 | a        |
| A      | 2018-02-26 10:49:32 | a        | 2018-02-26 10:58:03 | b        |
| B      | 2018-02-26 10:51:10 | c        | 2018-02-26 10:58:45 | d        |

I haven't really had any idea how to do this. Thanks.

catris25
  • 1,173
  • 3
  • 20
  • 40

1 Answers1

1

Use DataFrameGroupBy.shift by 2 columns, and last remove last duplicated rows by person column by Series.duplicated with rename columns:

df['datetime'] = pd.to_datetime(df['datetime'])
df1 = df.sort_values(by=['person', 'datetime'])

df1[['next_datetime','next_loc']] = df1.groupby('person')['datetime','location'].shift(-1)
d = {'datetime':'prev_datetime','location':'prev_loc'}
df2 = df1[df1['person'].duplicated(keep='last')].rename(columns=d)
print (df2)
  person       prev_datetime prev_loc       next_datetime next_loc
4      A 2018-02-26 10:43:34        e 2018-02-26 10:49:32        a
0      A 2018-02-26 10:49:32        a 2018-02-26 10:58:03        b
2      B 2018-02-26 10:51:10        c 2018-02-26 10:58:45        d
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • wow, thanks a lot @jezrael. Let me try to understand it first, since I have never used `DataFrameGroupBy.shift` before. But it seems like here you overwrite the existing dataframe though. Is it possible to make an entirely new dataframe? – catris25 Mar 26 '20 at 13:16
  • @catris25 - Sure, answer was [edited](https://stackoverflow.com/posts/60867674/revisions) – jezrael Mar 26 '20 at 13:17
  • Amazing! Thanks @jezrael. Can I ask something? What is the purpose of `.shift()` here? I tried to read the documentation, but I am not sure I understood. – catris25 Mar 26 '20 at 13:35
  • @catris25 - it shift values per groups, maybe better seen if add `print (df1)` after `df1[['next_datetime','next_loc']] = df1.groupby('person')['datetime','location'].shift(-1)`, same principe but shift with default `shift(1)` is used [here](https://stackoverflow.com/questions/53335567/use-pandas-shift-within-a-group) – jezrael Mar 26 '20 at 13:39