0

I am trying to learn the use of BERT. Here is the code:

from sklearn.datasets import fetch_20newsgroups
data = fetch_20newsgroups(subset='all')['data']

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('distilbert-base-nli-mean-tokens')
embeddings = model.encode(data, show_progress_bar=True)

The problem is that it is incredibly slow: 24-48 hours to complete.

I have macOS M1 Pro notebook. What can be done to speed-up the process?

Thank you

Toly
  • 2,981
  • 8
  • 25
  • 35
  • Maybe the problem is you are not using GPU, but CPU? – dankal444 Jul 03 '23 at 12:01
  • @dankal444 you mean adding: embeddings = model.encode(data, show_progress_bar=True, device='cuda') ? – Toly Jul 03 '23 at 14:10
  • I don't know specifics of this library and how to make it use GPU. Just saying your timings suggest that you are not using GPU. You can look at your CPU/GPU usage during encoding. – dankal444 Jul 03 '23 at 16:48

0 Answers0