I am looking to compute similarities between users and text documents using their topic representations. I.e. each document and user is represented by a vector of topics (e.g. Neuroscience, Technology, etc) and how relevant that topic is to the user/document.
My goal is then to compute the similarity between these vectors, so that I can find similar users, articles and recommended articles.
I have tried to use Pearson Correlation but it ends up taking too much memory and time once it reaches ~40k articles and the vectors' length is around 10k.
I am using numpy.
Can you imagine a better way to do this? or is it inevitable (on a single machine)?
Thank you