3

So I have a huge tfidf matrix with more than a million records, I would like to find the cosine similarity of this matrix with itself. I am using colab to run the code, but I am not sure how to best make use of the gpu provided by colab.

sequentially run code -

tfidf_matrix = tf.fit_transform(df['categories'])

cosine_similarities = linear_kernel(matrix, matrix)

Is there way we can parallelise the code using jit or any other way?

kb hithesh
  • 93
  • 8

1 Answers1

0
  1. try simple torch code like in this example from sentence transformers library: https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/util.py#L31 or just import the function.

  2. consider cuml library which uses CUDA acceleration https://docs.rapids.ai/api/cuml/nightly/api.html

Poe Dator
  • 4,535
  • 2
  • 14
  • 35