What is rank, numIteraions and lambda in ALS.train()?

Question

I am new to spark machine learning. I'm experimenting with Collaborative Filtering using ALS algorithm. In this algorithm I need some clarifications about rank, numIterations and lambda parameters which is used for training the algorithm. And I need to know how to tune this algorithm for the smallest and largest datasets to produce the improved predictions. Could somebody explain this?

This always depends on the data. Use cross-validation to chose these. Some intuitions (usually right): higher rank: more data needed; higher rank: more regulization /lambda needed. — sascha, Aug 23 '17 at 15:28

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

From the documentation:

numBlocks is the number of blocks used to parallelize computation (set to -1 to auto-configure).

rank is the number of features to use (also referred to as the number of latent factors).

iterations is the number of iterations of ALS to run. ALS typically converges to a reasonable solution in 20 iterations or less.

numBlocks has to do with the blocks of the matrix.

rank are the hidden factors, the number of features you would like to use. Read more here.

iterations is the number of the repeatitions you want to perform. The parameter's name is not numIterations.

What is rank, numIteraions and lambda in ALS.train()?

1 Answers1