0

I have the following codes:

ItemSimilarity itemSimilarity = new UncenteredCosineSimilarity(dataModel); 
recommender = new GenericItemBasedRecommender(dataModel,itemSimilarity);
List<RecommendedItem> items = recommender.mostSimilarItems(10, 5);

my datamodel is like this: uid itemid socre

userid itemid score
  1    6    5
  1   10    3
  1   11    5
  1   12    4
  1   13    5
  2   2     3
  2   6     5
  2   10    3
  2   12    5

when I run the code above,the result is just like this: 13 6 11 2 12 I debug the code,and find that the List items = recommender.mostSimilarItems(10, 5); return the items has the same score,that is one! so,I have a problem.in my opinion,I think the mostsimilaritem should consider the item co-occurrence matrix:

    2   6   10  11  12  13

2   0   1   1   0   1   0

6   1   0   2   1   2   1

10  1   2   0   1   2   1

11  0   1   1   0   1   1

12  1   2   2   1   0   1

13  0   1   1   1   1   0

in the matrix above ,the item 12's most similar should be [6,12,11,13,2],because the item 1 and item 12 is more similar than the other items,isn't it? now,anyone who can explain this for me?thanks!

2 Answers2

0

In your matrix you have much more data than in your input. In particular you seem to be imputing 0 values that are not in the data. That is why you are likely getting answers different from what you expect.

Sean Owen
  • 66,182
  • 23
  • 141
  • 173
  • Thank you very much for your reply,I am sorry for my bad description,please look at the new question again,the matrix is means co-occurrence matrix,that is the common item appeared count,please see the new description,I still don't understand the result[13 6 11 2 12 ],in my opinion,I think it should be ordered as [[6,12,11,13,2]],because the item 1 and item 12 is more similar than the other items,isn't it? – zhouyan8603 May 25 '14 at 14:29
0

Mahout expects your IDs to be contiguous Integers starting from 0. This is true of your row and column ids. Your matrix looks like it has missing ids. Just having Integers is not enough.

Could this be the problem? Not sure what Mahout would do with the input above.

I always keep a dictionary to map Mahout IDs to/from my own.

pferrel
  • 5,673
  • 5
  • 30
  • 41
  • Thanks for your reply,but I don't think this's the true reason, maybe the function mostSimilarItems() don't consider the score?if possible,you can have the above demo to have a try. – zhouyan8603 May 27 '14 at 02:10