How to understand the dimensions of Sparse matrix (Compressed Sparse Row format) generated in lightFM to map it to #features?

Question

I am working on a lightFM hybrid recsys model with both user and item metadata features. These user and item features are of the format:

(0, [feature1, feature2, ... feature n]) and applying them into dataset.build_user_features() and a sparse matrix of the features, in CSR format is generated.

It is having shape of:

number of users x [Some_number_denoting_dimension] - For user features
number of items x [Some_number_denoting_dimension] - For item features

Eg. For 6276 items which I have in my dataset,

<6276x24103 sparse matrix of type '<class 'numpy.float32'>' with 25099 stored elements in Compressed Sparse Row format>

How is the second dimension (24103) generated? I am interested to understand this because the lightFM also provides methods to get feature weights.

model.user_embeddings.shape

model.item_embeddings.shape

It is of the shape (24103, 100). They are mapped according to the sparse matrix dimensions. 24103 is the features dimension of the sparse matrix and 100 is the number of estimators given in model.fit()

I am keen to understand:

how this dimension is created for sparse matrix?
how to backtrack this info to get the feature weights/feature importance from the data for explainability?

Any advise or leads would be of great help. Thanks.

How to understand the dimensions of Sparse matrix (Compressed Sparse Row format) generated in lightFM to map it to #features?

0 Answers0