An ALS recommendation model performs a matrix factorization where it factorizes a matrix of users vs items in latent factors.
A matrix of 3 users and 3 items would look like this:
users | item_1 | item_2 | item_3 |
---|---|---|---|
user_1 | NA | 4 | 1 |
user_2 | 4 | 3 | 0 |
user_3 | NA | 1 | NA |
My dataframe starts such as:
users | items | rating |
---|---|---|
user_1 | item_2 | 4 |
user_1 | item_3 | 1 |
user_2 | item_1 | 4 |
user_2 | item_2 | 3 |
user_2 | item_3 | 0 |
user_3 | item_2 | 1 |
My question is, before inserting my dataframe in ALS module, do I need to transform it in way where, at the end, I will have a structure such as:
users | items | rating |
---|---|---|
user_1 | item_1 | NA |
user_1 | item_2 | 4 |
user_1 | item_3 | 1 |
user_2 | item_1 | 4 |
user_2 | item_2 | 3 |
user_2 | item_3 | 0 |
user_3 | item_1 | NA |
user_3 | item_2 | 1 |
user_3 | item_3 | NA |
Or, will, under the hood, ml.recommendation.ALS function create those observations related to the places without interactions? Such as:
users | items | rating |
---|---|---|
user_1 | item_1 | NA |
If it does not, a way to produce the expected table, would be pivot it and then unpivot it, but it would produce a very huge matrix of users vs items. However, from the examples presented in the documentation, it seems that this process (pivot and then, unpivot) is not necessary.