0

I am new to mahout and I building an implicit feedback recommender using the parallelALS job given here. Each row of my dataset consists of user_id, product_id, preference_score(which is the number of visits made by the user for the product). The user and product ids are of type long. I have a million data points of this kind after filtering out single or double visits.

I have basically written a bash script that runs the two jobs “parallelALS” and “recommendfactorized” just as shown in the example “factorize-movielens-1M”. After running the script, the resulting recommendations seem to have a bug. The format of each row of the results (as explained in several blog posts) seems to be :-
user_id [product_id:score,…]

However all the products_ids in every row is 0. I am not sure what is going wrong here. Is this a problem with the dataset or a matter of tuning parameters (alpha,lambda, etc) or something else?

Sneha
  • 21
  • 3
  • Post an example of you data. The ids for user and item must be Mahout IDs, which means consecutive integers. You need to maintain dictionaries to map user and items IDs to and from mahout IDs (two HashBiMaps will work or a database) , this is something new users often miss. – pferrel Jul 08 '14 at 01:04

1 Answers1

0

The ids for user and item must be Mahout IDs, which means consecutive integers. You need to maintain dictionaries to map user and items IDs to and from mahout IDs (two HashBiMaps will work or a database) , this is something new users often miss.

To tell for sure post an example of your input data.

pferrel
  • 5,673
  • 5
  • 30
  • 41