Does anybody know how to obtain the numeric value of similarity between any two users of a given dataset in Apache Mahout?
Asked
Active
Viewed 326 times
1 Answers
0
There are several ways, what does your data look like? Is it interaction data like purchases or views or ratings?
If so itemsimilarity or spark-itemsimilarity will work but instead of feeding swap the item and user IDs. If you encode the data as a sparse matrix, one row per user, you can also use rowsimilarity or spark-rowsimilarity.
For the hadoop jobs the IDs must be Mahout IDs, non-zero row and column numbers for the items and users. For the Spark jobs you can use whatever IDs you want--they will be read as text and so must be a unique string.
Pearson is only supported by the hadoop jobs. Spark jobs use the log-likelihood ratio only. In collaborative filtering applications LLR is almost always better than the other "similarity" metrics.

pferrel
- 5,673
- 5
- 30
- 41