I have a dataframe containing users' trajectories and segments. A segment of a trajectory is considered part of the trajectories between 2-stops. So my df
looks like this:
df = pd.DataFrame(
{
'trajectory': [1,1,1,2,2,2,3,3,3,4],
'segment': [0,2,4,1,3,5,2,5,1,2],
'user': ['A','A','A','B','B','B','A','A','A','C']
}
)
df
trajectory segment user
0 1 0 A
1 1 2 A
2 1 4 A
3 2 1 B
4 2 3 B
5 2 5 B
6 3 2 A
7 3 5 A
8 3 1 A
9 4 2 C
- the number of segments in a user's trajectory are not sequential, e.g.
trajectory 3
of userA
are:2,5
, so 2 segments. - some users contribute more segments than others.
I want to plot the CDF
of the number of segments per trajectory per user. This to understand on average, how many segments a user contributes per trajectory?