We have a set of sequences with taxi positions. We want to cluster the data by considering the sequential patterns in the data lines. For example: T1, T2, T3, T4 be the travels and a,b,c,d,e be set of places. The data we have is like,
- T1 b c b a d
- T2 a
- T3 a b a b a b c e d
- T4 b c d c b d c a
But the problem is the length of the data are not variable. How can we cluster these type of data using EM. Since it does not accept variable length data is there way we can customize it.