0

test (a table with columns: user_id, item_id, rating, with 6.2M rows)

als = ALS(userCol="user_id",
                itemCol="item_id",
                ratingCol="rating",
                coldStartStrategy="drop",
                implicitPrefs=True)
model = als.fit(train)
predictions = model.transform(test)

predictions (a table with columns: user_id, item_id, rating, prediction, but with only 1.7M rows)

Why did model.transform(test) drop rest of the rows? It should have been able to calculate prediction score for all user_id, item_id combination, right?

Is it because I have used coldStartStrategy="drop"?

  • But if there is a rating calculated for all user_id, item_id combinations in test, no row should be dropped, yes?
Anmol Deep
  • 463
  • 1
  • 5
  • 16

1 Answers1

2

It's because I have used the coldStartStrategy="drop" option only. It's dropping rows corresponding to users and items which had no interactions corresponding to them in training data.

Anmol Deep
  • 463
  • 1
  • 5
  • 16