I need to get the last layer of embeddings from a BERT model using HuggingFace. The following code works, however is extremely slow, how can I increase the speed?
This is a toy example, my real data is made of thousands of examples with long texts.
import transformers
import pandas as pd
from transformers import BertModel, BertTokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased")
def getPrediction(text):
encoded_input = tokenizer(text, return_tensors='pt')
outputs = model(**encoded_input)
embedding = outputs[0][:, -1]
embedding_1ist = embedding.cpu().detach().tolist()[0]
return embedding_1ist
df = pd.DataFrame({'text':['First text', 'Second text']})
results = pd.DataFrame(df.apply(lambda x: getPrediction(x.text), axis=1))