0

Does anybody know how to obtain the numeric value of similarity between any two users of a given dataset in Apache Mahout?

1 Answers1

0

There are several ways, what does your data look like? Is it interaction data like purchases or views or ratings?

If so itemsimilarity or spark-itemsimilarity will work but instead of feeding swap the item and user IDs. If you encode the data as a sparse matrix, one row per user, you can also use rowsimilarity or spark-rowsimilarity.

For the hadoop jobs the IDs must be Mahout IDs, non-zero row and column numbers for the items and users. For the Spark jobs you can use whatever IDs you want--they will be read as text and so must be a unique string.

Pearson is only supported by the hadoop jobs. Spark jobs use the log-likelihood ratio only. In collaborative filtering applications LLR is almost always better than the other "similarity" metrics.

pferrel
  • 5,673
  • 5
  • 30
  • 41