Getting an embedded output from huggingface transformers

Question

To compare different paragraphs, I am trying to use a transformer model, fitting each paragraph onto the model and then in the end I intend to compare the outputs and see which paragraph has the most similarity.

For this purpose, I am using Roberta-base model. I first used roberta tokenizer on a paragraph. Then I used the roberta model on that tokenized output. But the process is failing due to lack of memory. Even 25GB ram is not enough to complete the process for the paragraphs with 1324 lines.

Any idea how can I make it better or any suggestion what mistakes i might be doing?

from transformers import RobertaTokenizer, RobertaModel
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
tokenizer = RobertaTokenizer.from_pretrained("roberta-base")

model = RobertaModel.from_pretrained("roberta-base").to(device)

inputs = tokenizer(dict_anrika['Anrika'], return_tensors="pt", truncation=True, 
padding=True).to(device)
outputs = model(**inputs)

Can you be more specific about the data? 1324 lines sound almost like a book, not a paragraph... Also, for direct comparison of the embeddings using cosine distance, sentence transformers might be better. If your texts are that long, you should consider document-level models like LongFormer or Big Bird. — Jindřich, Jan 25 '23 at 09:47
so the 1324 lines are whatsapp messages belonging to a person. all the messages that he/she has sent to me, i am trying to encode them with the aim to capture the writing style of the author. although i know that this model itself wont do that, but still i want to see how the embeddings of two different authors can be different. — akshit bhatia, Jan 25 '23 at 13:49

score 1 · Accepted Answer · answered Jan 25 '23 at 19:40

1

Sound like you gave the model input of shape [1324, longest_length_in_batch], which is huge. I tried [1000, 512] input, and found even 200GB RAM server also hits OOM.

One solution is to break the huge input into smaller batches, for example 10 lines at a time.

answered Jan 25 '23 at 19:40

eval

1,169
12
19

thanks for the reply. I used max_length=50 and the model is now running. it took 40 gb ram though. One question: so i encoded messages of one person using this model. Now to encode the message of second person, do i instantiate the model again with new roberta class or do i use the previously instantiated model on which messages of first person was encoded. I hope the question is clear. – akshit bhatia Jan 28 '23 at 21:41
1

I my understanding, you can just use the same model obj to run multiple inference. – eval Jan 29 '23 at 02:47

Getting an embedded output from huggingface transformers

1 Answers1