2

I built a recommendation model on a user-item transactional dataset where each transaction is represented by 1.

model = LightFM(learning_rate=0.05, loss='warp')

Here are the results

Train precision at k=3:  0.115301
Test precision at k=3:  0.0209936

Train auc score:  0.978294
Test auc score : 0.810757

Train recall at k=3:  0.238312330233
Test recall at k=3:  0.0621618086561

Can anyone help me interpret this result? How is it that I am getting such good auc score and such bad precision/recall? The precision/recall gets even worse for 'bpr' Bayesian personalized ranking.

Prediction task

users = [0]
items = np.array([13433, 13434, 13435, 13436, 13437, 13438, 13439, 13440])
model.predict(users, item)

Result

array([-1.45337546, -1.39952552, -1.44265926, -0.83335167, -0.52803332,
   -1.06252205, -1.45194077, -0.68543684])

How do I interpret the prediction scores?

Thanks

Manas
  • 399
  • 1
  • 4
  • 13

1 Answers1

1

When it comes to the difference between precision@K at AUC, you may want to have a look at my answer here: Evaluating the LightFM Recommendation Model.

The scores themselves do not have a defined scale and are not interpretable. They only make sense in the context of defining a ranking over items for a given user, with higher scores denoting a stronger predicted preference.

Maciej Kula
  • 849
  • 8
  • 5
  • Is it somehow possible to get rates as predictions? I mean, if we give rates (between 1 and 10) to the model to train, how can we get the same kind of output? Thanks – Mez13 Apr 16 '20 at 15:15