0

I am trying to use the item based recommender in mahout. It contains 2.5 M user,item interaction, without preference values. There are around 100 items and 100k users.It takes around 10s to recommend. Whereas for same data it takes less than a second when I use user based recommender.

ItemSimilarity sim = new TanimotoCoefficientSimilarity(dm); 
CandidateItemsStrategy cis = new SamplingCandidateItemsStrategy(10,10,10,dm.getNumUsers(),dm.getNumItems());
MostSimilarItemsCandidateItemsStrategy mis = new SamplingCandidateItemsStrategy(10,10,10,dm.getNumUsers(),dm.getNumItems());
Recommender ur = new GenericBooleanPrefItemBasedRecommender(dm,sim,cis,mis);

I read one of the answer of @Sean where he suggests using the above parameters for SamplingCandidateItemsStrategy. But I am not that sure what it really does.

Edit: 2.5 M is the total user-item associations, there are 100K users and the total number of items are 100.

tshepang
  • 12,111
  • 21
  • 91
  • 136
soyeb84
  • 72
  • 1
  • 9

1 Answers1

1

Among the many reasons, the main reason for choosing item-based recommender is: if the number of items is relatively low compared to the number of users, the performance advantage could be significant. This goes the other way around too. If the number of users is relatively low compared to the number of items, choosing user-based recommendation will result in performance advantage.

From your question I really did not get what is the number of items in your dataset, as well as the number of users. Once you mention 2.5M and then 100K? In any case if the user-based recommendation is faster for you, you should choose this approach.

Except, if your item-item similarities are more fixed (not expected to change radically or frequently), then they are better candidates for precomputation. You could do precomputation and used the precomputed similarities between the items.

Also, since you don't have preference values, and if you want to use item-based similarity, you can think of enriching the similarity function with some pure item-item similarity based on some characteristics of the items. (This is just an idea).

Dragan Milcevski
  • 776
  • 7
  • 17