This is a question about how to make things properly with pandas (I use version 1.0
).
Let say I have a DataFrame with missions which contains an origin and one or more destinations:
mid from to
0 0 A [C]
1 1 A [B, C]
2 2 B [B]
3 3 C [D, E, F]
Eg.: For the mission (mid=1
) people will travel from A
to B
, then from B
to C
and finally from C
to A
. Notice, that I have no control on the datamodel of the input DataFrame.
I would like to compute metrics on each travel of the mission. The expected output would be exactly:
tid mid from to
0 0 0 A C
1 1 0 C A
2 2 1 A B
3 3 1 B C
4 4 1 C A
5 5 2 B B
6 6 2 B B
7 7 3 C D
8 8 3 D E
9 9 3 E F
10 10 3 F C
I have found a way to achieve my goal. Please, find bellow the MCVE:
import pandas as pd
# Input:
df = pd.DataFrame(
[["A", ["C"]],
["A", ["B", "C"]],
["B", ["B"]],
["C", ["D", "E", "F"]]],
columns = ["from", "to"]
).reset_index().rename(columns={'index': 'mid'})
# Create chain:
df['chain'] = df.apply(lambda x: list(x['from']) + x['to'] + list(x['from']), axis=1)
# Explode chain:
df = df.explode('chain')
# Shift to create travel:
df['end'] = df.groupby("mid")["chain"].shift(-1)
# Remove extra row, clean, reindex and rename:
df = df.dropna(subset=['end']).reset_index(drop=True).reset_index().rename(columns={'index': 'tid'})
df = df.drop(['from', 'to'], axis=1).rename(columns={'chain': 'from', 'end': 'to'})
My question is: Is there a better/easier way to make it with Pandas? By saying better I mean, not necessary more performant (it can be off course), but more readable and intuitive.