I have a last.fm dataset composed of songs and their tags given by the users. I want to apply a clusterization on the dataset in order to find clusters of songs based on tags.
The dataset has 200k songs and 119k different tags. I was previously thinking on making a matrix NxM, where N is the number of songs and M is the number of attributes, and each position is 0 or 1 indicating the presence or not presence of a tag in the song. However, the huge dimension of the matrix has stopped me for doing so. I have some ideas on applying a SVD for reducing dimensionality before applying the clustering, but I don't know exactly if it is the best approach.
Therefore, does anybody know some work in the literature which attempts to perform such kind of clustering? Or any other idea in my problem?
Thank you very much in advance