How to use Apache Spark ALS (alternating-least-squares) algorithm with limited Rating values

Question

I am trying to use ALS, but currently my data is limited to information about what user bought. So I was trying to fill ALS from Apache Spark with Ratings equal 1 (one) when user X bought item Y (and only such information I provided to that algorithm).

I was trying to learn it (divided data to train/test/validation) or was trying just to learn on all data but at the end I was getting prediction with extremely similar values for any pair user-item (values differentiated on 5th or 6th place after comma like 0,86001 and 0,86002).

I was thinking about that and maybe it is because I can provide only rating equal 1 so does ALS cannot be used in such extreme situation?

Is there any trick with ratings so I could use to fix such problem (I have only information's about what was bought - later I am going to get more data, but at a moment I have to use some kind of collaborative filtering until I will acquire more data - in other words I need to show user some kind of recommendation on startup page I choose ALS for startup page but maybe I use something else, what exactly)?

Ofcourse I was changing parameters like iterations, lambda, rank.

score 1 · Accepted Answer · answered Feb 11 '15 at 20:27

1

In this case, the key is that you must use trainImplicit, which ignores Rating's value. Otherwise you're asking it to predict ratings in a world where everyone rates everything 1. The right answer is invariably 1, so all your answers are similar.

answered Feb 11 '15 at 20:27

Sean Owen

66,182
23
141
173

Thanks Sean I will check this and provide information how it works – Adrian Feb 11 '15 at 21:42
I got RMSE somewhere close to 0.193 for very naive training (all data was in train and in test). In addition predictions differentiate and at a moment it looks that it is founding proper relations. What is for me interesting is that I got a lot better RMSE with high alpha - like 10 or 50. Thanks very much for help! – Adrian Feb 12 '15 at 03:49
You can't really use RMSE in this case. That's not what it is minimizing. – Sean Owen Feb 12 '15 at 07:36
Sean how I could check if I choose proper parameters (at a moment I am creating application so I cannot just check it on production environment)? I thought that prediction in this case is equal to confidence and assumed that if confidence is close to one then it is almost sure that this item will be bought (at least it is my understanding of this). – Adrian Feb 12 '15 at 13:20
1

The output is not a probability, if that's what you mean. The values are generally in [0,1] but not always. You can use a metric like AUC to evaluate the model. Hyperparameter tuning is usually a matter of just trying lots of combinations. Ad: we wrote a book on lots of use cases like this with MLlib, including hyperparam tuning with ALS. http://shop.oreilly.com/product/0636920035091.do – Sean Owen Feb 12 '15 at 13:36
Sean read mentioned paragraph and found that spark is implementing AUC. Unnecessary I have problem with using it - created new question: here. If you will have some time please look at this (I saw your implementation, but maybe it is better to use directly spark implementation - I am going to have big amount of data, and it may be better to use code which is directly supported - what do you think?). Still it would be good to understand how to use that spark implementation;) – Adrian Feb 15 '15 at 19:05

How to use Apache Spark ALS (alternating-least-squares) algorithm with limited Rating values

1 Answers1