I am new to spark machine learning. I'm experimenting with Collaborative Filtering using ALS algorithm. In this algorithm I need some clarifications about rank, numIterations and lambda parameters which is used for training the algorithm. And I need to know how to tune this algorithm for the smallest and largest datasets to produce the improved predictions. Could somebody explain this?

- 71,951
- 46
- 188
- 305

- 172
- 3
- 12
-
This always depends on the data. Use cross-validation to chose these. Some intuitions (usually right): higher rank: more data needed; higher rank: more regulization /lambda needed. – sascha Aug 23 '17 at 15:28
-
Got an idea. Thanks... @sascha – Suresh Kumar Aug 24 '17 at 01:05
1 Answers
From the documentation:
numBlocks is the number of blocks used to parallelize computation (set to -1 to auto-configure).
rank is the number of features to use (also referred to as the number of latent factors).
iterations is the number of iterations of ALS to run. ALS typically converges to a reasonable solution in 20 iterations or less.
numBlocks
has to do with the blocks of the matrix.
rank
are the hidden factors, the number of features you would like to use. Read more here.
iterations is the number of the repeatitions you want to perform. The parameter's name is not numIterations.
Read more about ALS here.
I need to know how to tune the training parameters to increase the prediction with the less and high number of datas?
This always depends on the data. Use cross-validation to chose these.
-
Thanks gsamaras. I have the basic idea of ALS is working on matrix factorization model. I need to know how to tune the training parameters to increase the prediction with the less and high number of datas? – Suresh Kumar Aug 23 '17 at 12:27
-