-1

The scenario is like this:

I am trying to make a recommender using apache mahaout and i have some sample preference(user,item,preference value) data for generating the similarity matrix and determining item-item similarities. But the actual preference data is much larger than the sample preference data. The list of item IDs that are present in the actual preference data are all present in the sample preference data as well. But the User ids in sample data are much lesser than the actual data.

Now, when i try to run the recommender on the actual data, it keeps giving me error that user id does not exist because it was not present in the sample data. How can i inject new user ids and their preferences in the recommender of mahout so that it can generate recommendations for any user on the fly based on item-item similarity? Or if there is any other way possible to generated recommendations for a new user, then please suggest.

Thanks.

Taryn
  • 242,637
  • 56
  • 362
  • 405

1 Answers1

0

If you think your sample data is complete for computing the item-item similarities, why don't you precompute them and use Collection<GenericItemSimilarity.ItemItemSimilarity> corrMatrix = new ArrayList<GenericItemSimilarity.ItemItemSimilarity>(); to store your precomputed similarities. Then from this you can create your ItemSimilarity like this: ItemSimilarity similarity = new GenericItemSimilarity(correlationMatrix);

I think it is not good idea for using sample of your data for computing item-item similarities based on the preference values, because you might be missing a lot of useful data. If you think that computing it on the fly is slow, you can always precomputed it and store it in a database, and load it when needed.

If you are still getting this error, than you probably use your sample data model in the recommendation class, or you use UserSimilarity to compute the item similarities.

If you want to add new user you can either use Mahout's FileDataModel and update the file periodically by including new users (I think you can create new file with some suffix, I am not sure). You can find more about this in the book Mahout in Action. The in-memory DataModel implementations are immutable. You can extend them by implementing the methods setPreference() and removePreference().

EDIT: I have an implementation for MutableDataModel that extends the AbstractDataModel. I can share it with you if you want.

Dragan Milcevski
  • 776
  • 7
  • 17
  • item similarity matrix hardly takes 10 milliseconds to generate as i have 400 items and 50000 users in sample data. so i do not want to pre-compute the similarity matrix. setpreference() and removepreference() functions are throwing an exception java.lang.UnsupportedOperationException as i am using a FileDataModel. – user3095388 Dec 13 '13 at 17:02
  • None of the classes from Mahout have these methods implemented. You should implemented by yourself. I was saying that if you use FileDataModel you can change the file and add preferences, and than call `Recommender.refresh()` which will force the data model to reload and include the new users. You should really read that from the book how is this done. I think for the new users you can create separate file with the same name as the original file plus number extension, i.e. file001.. something like that. Calling the refresh function from the recommender will also refresh all the cache. – Dragan Milcevski Dec 13 '13 at 20:52
  • Thank you very much Dragan. Recommender.refresh() worked just fine for me. Can you guide me from where can i read more about Recommender.refresh? Because i want only the changes to be refreshed by Recommender.refresh and not the whole thing to be loaded again. Thanks for the help. – user3095388 Dec 14 '13 at 13:13
  • As I mentioned in my response, the Mahout in Action is the bible for Mahout. If you don't want to refresh everything you can call the refresh method on the similarity class, e.g. ItemSimilarity.refresh(). There you have to provide collection of alredy refreshed items. Honestly, I haven't used it and I don't know how it works exactly, but from the documentation is says: **alreadyRefreshed** s that are known to have already been refreshed as a result of an initial call to a method on some object. This ensure that objects in a refresh dependency graph aren't refreshed twice needlessly. – Dragan Milcevski Dec 14 '13 at 16:37
  • If you like the answer you should mark it as accept, so the others can benefit from it, and will encourage others to help you. – Dragan Milcevski Aug 04 '14 at 12:17
  • this wouldn't work. GenericDataModel is final and DataModel is abstract, so implementing setPreference/removePreference is not enough – Stepan Yakovenko Nov 02 '18 at 14:48