0

What is the right way to setup the data when feeding to lightfm model for cases where I have additional implicit data on additional items/products. For example, I have 100k users x 200 items interaction data, however in real application, I want the model to provide recommendations only from 50 out of the 200 items. So how do I setup the data? I am thinking of 2 cases but I am not sure which is the right approach:

Case 1: Feed the whole matrix (100k users x 200 items) directly as interactions argument in lightfm. This way it is more collaborative learning.

Case 2: Only feed (100k users x 50 items) to interactions argument and use the (100k x 150 items) matrix as user_features. This way it's more content based learning.

Which one is correct? Also, for case 1, is there a way for the utility functions for model evaluations(precision, recall, etc) to recommend for selected items only, for example, the top k recommended items should only be taken from the 50 items and do not recommend the other items and compute the precision, recall, etc from those.

bninopaul
  • 2,659
  • 4
  • 15
  • 22

1 Answers1

1

You should follow case 1. Train the model with entire interactions data. While making predictions, you can pass the index of required(50) items as a parameter to model.predict.

From the lightfm documentation, you can see that model.predict takes item ids as parameter(which will be ids of your 50 items in this case).

https://making.lyst.com/lightfm/docs/_modules/lightfm/lightfm.html#LightFM.predict

def predict(self, user_ids, item_ids, item_features=None, user_features=None, num_threads=1): """ Compute the recommendation score for user-item pairs.

    Arguments
    ---------

    user_ids: integer or np.int32 array of shape [n_pairs,]
         single user id or an array containing the user ids for the
         user-item pairs for which a prediction is to be computed
    item_ids: np.int32 array of shape [n_pairs,]
         an array containing the item ids for the user-item pairs for which
         a prediction is to be computed
    user_features: np.float32 csr_matrix of shape [n_users, n_user_features], optional
         Each row contains that user's weights over features
    item_features: np.float32 csr_matrix of shape [n_items, n_item_features], optional
         Each row contains that item's weights over features
    num_threads: int, optional
         Number of parallel computation threads to use. Should
         not be higher than the number of physical cores.
  • yes, I see, but I was just wondering if which approach is more effective. Since the 150 items are not part of the products that I would be recommending, would it be better to include it in interactions data or feed it as a user_features and which one yields good results. I guess I just have to try to find out, but thanks anyway. – bninopaul Jun 08 '20 at 04:52