I am sorry if I couldn't word the question title perfectly.
This is actually quite similar to what I have asked previously before here. Which I actually got a solution for, but now I am facing different problem.
So I have the following dataframe
df = pd.DataFrame({'person':['A', 'A', 'B', 'B', 'A', 'C', 'B', 'C'],
'datetime':['2018-02-26 10:49:32', '2018-02-26 10:58:03', '2018-02-26 10:51:10',
'2018-02-26 10:58:45', '2018-02-26 10:43:34', '2018-02-26 10:49:51',
'2018-02-26 10:51:51', '2018-02-26 10:55:10'],
'location':['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']})
person datetime location
A 2018-02-26 10:49:32 a
A 2018-02-26 10:58:03 b
B 2018-02-26 10:51:10 c
B 2018-02-26 10:58:45 d
A 2018-02-26 10:43:34 e
C 2018-02-26 10:49:51 f
B 2018-02-26 10:51:51 g
C 2018-02-26 10:55:10 h
And if we sort it by person and time, we get
df.sort_values(by=['person', 'datetime'])
person datetime location
A 2018-02-26 10:43:34 e
A 2018-02-26 10:49:32 a
A 2018-02-26 10:58:03 b
B 2018-02-26 10:51:10 c
B 2018-02-26 10:51:51 g
B 2018-02-26 10:58:45 d
C 2018-02-26 10:49:51 f
C 2018-02-26 10:55:10 h
Which you can read as person A goes to location 'e', then goes to location 'a', then goes to 'b'. Same with person B, he goes to location 'c', then 'g', then 'd', and so on.
I want to turn it into like this
person prev_datetime prev_loc curr_datetime curr_loc next_datetime next_loc
A 2018-02-26 10:43:34 e 2018-02-26 10:49:32 a 2018-02-26 10:58:03 b
A 2018-02-26 10:49:32 a 2018-02-26 10:58:03 b 2018-02-26 10:43:34 e
B 2018-02-26 10:51:10 c 2018-02-26 10:51:51 g 2018-02-26 10:58:45 d
B 2018-02-26 10:51:51 g 2018-02-26 10:58:45 d 2018-02-26 10:51:10 c
C 2018-02-26 10:49:51 f 2018-02-26 10:55:10 h 2018-02-26 10:49:51 f
So, you see person A goes to location 'e', then goes to location 'a', then goes to 'b', but in the end we make this person return to e again (the beginning).
I applied what I have learned from the previous question I asked and did some modification.
df['datetime'] = pd.to_datetime(df['datetime'])
df1 = df.sort_values(by=['person', 'datetime'])
df1[['curr_datetime','curr_loc']] = df1.groupby('person')['datetime','location'].shift(-1)
d = {'datetime':'prev_datetime','location':'prev_loc'}
df2 = df1[df1['person'].duplicated(keep='last')].rename(columns=d)
Outputs
person prev_datetime prev_loc next_datetime next_loc
A 2018-02-26 10:43:34 e 2018-02-26 10:49:32 a
A 2018-02-26 10:49:32 a 2018-02-26 10:58:03 b
B 2018-02-26 10:51:10 c 2018-02-26 10:51:51 g
B 2018-02-26 10:51:51 g 2018-02-26 10:58:45 d
C 2018-02-26 10:49:51 f 2018-02-26 10:55:10 h
Then
df2[['next_datetime','next_loc']] = df2.groupby('person')['curr_datetime','curr_loc'].shift(-1)
Outputs
person prev_datetime prev_loc curr_datetime curr_loc next_datetime next_loc
A 2018-02-26 10:43:34 e 2018-02-26 10:49:32 a 2018-02-26 10:58:03 b
A 2018-02-26 10:49:32 a 2018-02-26 10:58:03 b NaT nan
B 2018-02-26 10:51:10 c 2018-02-26 10:51:51 g 2018-02-26 10:58:45 d
B 2018-02-26 10:51:51 g 2018-02-26 10:58:45 d NaT nan
C 2018-02-26 10:49:51 f 2018-02-26 10:55:10 h NaT nan
By default, some of those are Nan, so I did the following,I want to fill those nan values with the min() of the trips by each person.
df2.loc[df2["next_loc"].isnull()][['next_datetime', 'next_loc']] = df2.loc[df2.groupby("person")["prev_datetime"].idxmin()][['prev_datetime', 'prev_loc']]
Which is not working, there is no error, but the dataframe still doesn't show the data as intended (still Nan). I am confused because there is no error message, but it still doesn't work. I don't know if there's better solution than this. Thanks.