Does pyspark.ml.recommendation.ALS create a pivot table under the hood?

Question

An ALS recommendation model performs a matrix factorization where it factorizes a matrix of users vs items in latent factors.

A matrix of 3 users and 3 items would look like this:

users	item_1	item_2	item_3
user_1	NA	4	1
user_2	4	3	0
user_3	NA	1	NA

My dataframe starts such as:

users	items	rating
user_1	item_2	4
user_1	item_3	1
user_2	item_1	4
user_2	item_2	3
user_2	item_3	0
user_3	item_2	1

My question is, before inserting my dataframe in ALS module, do I need to transform it in way where, at the end, I will have a structure such as:

users	items	rating
user_1	item_1	NA
user_1	item_2	4
user_1	item_3	1
user_2	item_1	4
user_2	item_2	3
user_2	item_3	0
user_3	item_1	NA
user_3	item_2	1
user_3	item_3	NA

Or, will, under the hood, ml.recommendation.ALS function create those observations related to the places without interactions? Such as:

users	items	rating
user_1	item_1	NA

If it does not, a way to produce the expected table, would be pivot it and then unpivot it, but it would produce a very huge matrix of users vs items. However, from the examples presented in the documentation, it seems that this process (pivot and then, unpivot) is not necessary.

score 0 · Answer 1 · answered May 31 '22 at 10:40

0

Yes. It is not necessary.

After you train you the ALS model, the fitted model should be used to predict the "missing interactions".

Thus, the term "fill" (in your sentence " ml.recommendation.ALS module fill those missing interactions") is not appropriate, you should uses the term "predict".

answered May 31 '22 at 10:40

lanenok

2,699
17
24

I am not arguing about the output from the predictions of the model, I do know what ALS performs. I'm arguing about the structure of the input in the ALS pyspark function. – Gustavomoty Jun 01 '22 at 15:33

Does pyspark.ml.recommendation.ALS create a pivot table under the hood?

1 Answers1