My pandas data frame looks something like this:
Movieid review movieRating wordEmbeddingVector
1 "text" 4 [100 dimensional vector]
I am trying to run a doc2vec implementation and I want to be able to group by movie ids and the take the sum of the vectors in wordEmbeddingVector and calculate a cosine similarity between the summed vector and the input vector I tried doing
movie_groupby = movie_data.groupby('movie_id').agg(lambda v : cosineSimilarity(np.sum(movie_data['textvec'])), inputvector)
But it seemed to run for ages and I thought I might be doing something wrong. So I tried to remove the similarity function and just group by and sum. But this seems to not finish as well (well 1 hour and up now) Am I doing something wrong or is it actually just that slow? I have 135392 rows in my data frame so its not massive.
movie_groupby = movie_data.groupby('movie_id').agg(lambda v : np.sum(movie_data['textvec']))
Much Appreciated!