1

I have customer's data based on his stay in the shop. The shop has 4 zones; zone 1,2,3 and 4. Now every 2 minutes, I get his reading as 10 numbers based on which zone he is in. EX:

1-1-1-1-1-1-1-1-3-3-2
4-4-3-3-3-3-3-2-1-3-3
3-4-1-2-2-3-1-4-2-1-4

Basically, I expect that there are customers who mostly are in a particular zone and they are clustered accordingly. So, in the first sequence, the customer seems to prefer zone 1, the next one zone 3 and the last one is like noise.

All I am feeding to the program is a bunch of sequences (unlabeled). How do I generate a distance/dissimilarity matrix that calculates the distances between each sequence in Python?

Sakib Shahriar
  • 121
  • 1
  • 12

2 Answers2

0

After a little bit of digging, I came across the textdistance library in python.

https://pypi.org/project/textdistance/

It seems to be working well for this problem, even though my input is a sequence of integers.

Sakib Shahriar
  • 121
  • 1
  • 12
0

You can use cosine or euclidean distances to calculate the distance.

https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.distance.cosine.html

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.euclidean_distances.html
nag
  • 749
  • 4
  • 9