Display pearson correlation similarity between two users in Apache Mahout

Question

Does anybody know how to obtain the numeric value of similarity between any two users of a given dataset in Apache Mahout?

score 0 · Answer 1 · answered Dec 20 '14 at 17:25

There are several ways, what does your data look like? Is it interaction data like purchases or views or ratings?

If so itemsimilarity or spark-itemsimilarity will work but instead of feeding swap the item and user IDs. If you encode the data as a sparse matrix, one row per user, you can also use rowsimilarity or spark-rowsimilarity.

For the hadoop jobs the IDs must be Mahout IDs, non-zero row and column numbers for the items and users. For the Spark jobs you can use whatever IDs you want--they will be read as text and so must be a unique string.

Pearson is only supported by the hadoop jobs. Spark jobs use the log-likelihood ratio only. In collaborative filtering applications LLR is almost always better than the other "similarity" metrics.

Display pearson correlation similarity between two users in Apache Mahout

1 Answers1