I am working on a lightFM hybrid recsys model with both user and item metadata features. These user and item features are of the format:
(0, [feature1, feature2, ... feature n])
and applying them into dataset.build_user_features()
and a sparse matrix of the features, in CSR format is generated.
It is having shape of:
- number of users x [Some_number_denoting_dimension] - For user features
- number of items x [Some_number_denoting_dimension] - For item features
Eg. For 6276 items which I have in my dataset,
<6276x24103 sparse matrix of type '<class 'numpy.float32'>' with 25099 stored elements in Compressed Sparse Row format>
How is the second dimension (24103) generated? I am interested to understand this because the lightFM also provides methods to get feature weights.
model.user_embeddings.shape
model.item_embeddings.shape
It is of the shape (24103, 100)
. They are mapped according to the sparse matrix dimensions. 24103 is the features dimension of the sparse matrix and 100 is the number of estimators given in model.fit()
I am keen to understand:
- how this dimension is created for sparse matrix?
- how to backtrack this info to get the feature weights/feature importance from the data for explainability?
Any advise or leads would be of great help. Thanks.