Spark MLlib - Collaborative Filtering Implicit Feed

Question

So I am building an implicit feedback recommender model with Spark 1.0.0 and I am trying to follow the example they have on their collaborative filtering page: http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#explicit-vs-implicit-feedback

And I even have the test dataset loaded up which they reference in the example: http://codesearch.ruethschilling.info/xref/apache-foundation/spark/mllib/data/als/test.data

However when I try to run the implicit feedback model: val alpha = 0.01 val model = ALS.trainImplicit(ratings, rank, numIterations, alpha)

(the ratings were the ratings exactly from their dataset and rank = 10, numIterations = 20) I am getting the following error:

scala> val model = ALS.trainImplicit(ratings, rank, numIterations, alpha)
<console>:26: error: overloaded method value trainImplicit with alternatives:
(ratings: org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],rank: Int,iterations: Int)org.apache.spark.mllib.recommendation.MatrixFactorizationModel <and>
(ratings: org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],rank: Int,iterations: Int,lambda: Double,alpha: Double)org.apache.spark.mllib.recommendation.MatrixFactorizationModel <and>
(ratings: org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],rank: Int,iterations: Int,lambda: Double,blocks: Int,alpha: Double)org.apache.spark.mllib.recommendation.MatrixFactorizationModel <and>
(ratings: org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating],rank: Int,iterations: Int,lambda: Double,blocks: Int,alpha: Double,seed: Long)org.apache.spark.mllib.recommendation.MatrixFactorizationModel
cannot be applied to (org.apache.spark.rdd.RDD[org.apache.spark.mllib.recommendation.Rating], Int, Int, Double)
val model = ALS.trainImplicit(ratings, rank, numIterations, alpha)

Interestingly, this model runs just fine when NOT doing trainImplicit (i.e. ALS.train)

score 4 · Accepted Answer · answered Sep 03 '14 at 19:35

The example seems to be out of sync with the implementation, as there are no overloads of trainImplicit with four parameters -- which is what the error message is telling you. However, if you look at the Scala source code for ALS you'll see that the three parameter overload is implemented in terms of the six parameter overload via some 'magic numbers':

def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int)
    : MatrixFactorizationModel = {
    trainImplicit(ratings, rank, iterations, 0.01, -1, 1.0)
}

This suggests that 0.01 is a decent default value for lambda. (Perhaps good to check with someone having a deeper understanding of ML.) This may give you enough information to put together a reasonable call to the five or six parameter overload. (Of course, if you know enough to pick better values, that's great!)

For example:

val model = ALS.trainImplicit(ratings, rank, numIterations, 0.01, alpha)

or

val model = ALS.trainImplicit(ratings, rank, numIterations, 0.01, -1, alpha)

Finally, you may not realize that there is pretty decent API documentaiton for ALS.

Perfect, the 'magic numbers' computation seems to work just fine! Thanks so much for the help!! — atellez, Sep 03 '14 at 20:18

Spark MLlib - Collaborative Filtering Implicit Feed

1 Answers1