df.tail()
will return the end of the entire dataset. What you're looking for is a little more complex than that. Here's some example code that solves that problem and generalizes to K last rows:
import pandas as pd
import numpy as np
# create the dataset example
index = [1, 2, 3, 4, 5, 6, 7, 8]
session_uuid = [1, 1, 1, 1, 2, 2, 2, 2]
timestamp = [2, 4, 5, 7, 2, 4, 10, 15]
action = ["action1", "action2", "action3", "action4",
"action1", "action2", "action3", "action4"]
df = pd.DataFrame(
{
"index": index,
"session_uuid": session_uuid,
"timestamp": timestamp,
"action": action
}
)
# the number of `last` actions you want
k = 2
# the dataframe to return will have k columns that are numbered
final_df = pd.DataFrame(columns=np.arange(k))
# group by session_uuid and sort them by timestamp inside those groups. Finally, get the last K rows in those sorted groups
last_k = df.groupby("session_uuid", as_index=False).apply(pd.DataFrame.sort_values, "timestamp").groupby(level=0).tail(k).groupby("session_uuid")
# this grabs the session_uuid in the same order as above so we can have that column in the new dataframe
uuids = df.groupby("session_uuid", as_index=False).groups.keys()
# go through each group (or each uuid)
for group in last_k:
# grab the action values out of the tuple
group = group[1]["action"]
# add the last actions to the new DataFrame but reshape it to match the dimensions of the new DataFrame
final_df = final_df.append(pd.Series(group.values.reshape(k)), ignore_index=True)
# add the UUID columns for reference and put it at the beginning
final_df.insert(loc=0, column="session_uuid", value=uuids)
print(final_df)
This code takes your example dataset and returns the last two (you can adjust k) actions for each group. If there are less than K values it fills the blank space with a NaN value.
Sample output looks like:
session_uuid 0 1
0 1 action3 action4
1 2 action3 action4
Or if you have less than K actions:
session_uuid 0 1
0 1 action1 NaN
1 2 action3 action4