-1

I am using Mahout Apache to write an item based recommender (based on similar item ratings by users) and I was wondering which of the following two similarity metrics would be the best to use:

Pearson, Spearman, Euclidean, Tanimoto and Loglikelihood

tlauer
  • 558
  • 2
  • 8
  • 22

1 Answers1

0

If you have preference values you should use Person Correlation or Euclidian distance similarity metrics. If you don't have preference values you should use Tanimoto coeficient or Loglikelihood. To choose which of the narrowed down to use you should perform evaluation on your dataset. That is why the evaluation framework of mahout is used. You can evaluate many metrics, like Mean Square Error (MSE), Absolute Mean Square Error, Precision, Recall, MAP...

I've coded Adjusted Cosine Similarity, variant of Pearson correlation which gives better results, but its slower.

Dragan Milcevski
  • 776
  • 7
  • 17
  • about "adjusted cosine similarity": Mahout doesn't come up with any standard method for that? I'm building a item-based recommender but I should take the user's rating-bias into account, and I can't find a function that does that "out of the box". Do you know of any? – PLB Apr 02 '15 at 17:13
  • 1
    It is pretty easy to create your own Adjusted Cosine Similarity. Just extend ItemSimilarity class, and take a look into PearsonCorrelationSimilarity and if you have problem open new Question I will paste the code I've created. Here there is no space. – Dragan Milcevski Apr 14 '15 at 07:18
  • thanks. I actually created that question already: http://stackoverflow.com/questions/29419222/mahout-adjusted-cosine-similarity-for-item-based-recommender. I've been told by people in the Mahout's mailing list that this is a very bad practice though and it doesn't really make sense to code it. So I didn't even try because I was running out of time. But if you want to paste your code I'll definitely have a look! – PLB Apr 14 '15 at 08:59
  • I pasted my code there. Just so you know, this implementation is little bit slower than PearsonCorrelationSimilarity because of the way the similarity is computed. – Dragan Milcevski Apr 14 '15 at 09:07