1

I have a dataset with only two fields itemId, productid, i would like to try mahout ALS or mllib for implicit feedback, is the best approach to create the preference column in the dataset with all 1's? reading koren paper (Collaborative Filtering for Implicit Feedback Datasets) i see that all confidence interval would be the same with same preferences, it is ok? thanks!

zero323
  • 322,348
  • 103
  • 959
  • 935

1 Answers1

0

If You plan to try ALS for Your recommendation I will encourage You to try directly mllib/ML from Spark. It has blocked implementation of ALS - it's really fast. Just when creating RDD[Rating] give for the rating value 1.0. Remember to learn ALS with parameter say You gave implicit feedback (implicitPrefs = True).

You can check example here: https://github.com/apache/spark/blob/v1.3.1/examples/src/main/scala/org/apache/spark/examples/mllib/MovieLensALS.scala#L113

And if You are brave enough to use new ML package with DataFrames: https://github.com/apache/spark/blob/v1.3.1/examples/src/main/scala/org/apache/spark/examples/ml/MovieLensALS.scala

Probably it's better for experimentation with parameters with new pipeline. But I find it difficult to work with my code were I have to access factorized matrices from the learned model.

Good luck and have fun!