0

I have used BERT with HuggingFace and PyTorch and used DataLoader, Serializer for Training & Evaluation. Below is the code for that:

! pip install transformers==3.5.1
from transformers import AutoModel, BertTokenizerFast

bert = AutoModel.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')


def textToTensor(text,labels=None,paddingLength=30):
  
  tokens = tokenizer.batch_encode_plus(text.tolist(), max_length=paddingLength, padding='max_length', truncation=True)
  
  text_seq = torch.tensor(tokens['input_ids'])
  text_mask = torch.tensor(tokens['attention_mask'])

  text_y = None
  if isinstance(labels,np.ndarray): # if we do not have y values
    text_y = torch.tensor(labels.tolist())

  return text_seq, text_mask, text_y


text = test_df['text'].values

seq,mask,_ = textToTensor(text,paddingLength=35)
data = TensorDataset(seq,mask)
dataloader = DataLoader(data,batch_size=1)

for step,batch in enumerate(dataloader):
  batch = [t.to(device) for t in batch]
  sent_id, mask = batch

  with torch.no_grad():
    print(np.argmax(model(sent_id, mask).detach().cpu().numpy(),1))

It gives me a numpy array as a result and since the batch_size=1 and No Serializer is used in this one, I am getting results as single array number as class prediction.

I have two questions:

Are the results strictly according to the index of df['text']?

**How can I get the predictions for a single sentence like Hello make my prediction. I am a single sentence?

Can someone please help me making a single prediction?

Deshwal
  • 3,436
  • 4
  • 35
  • 94
  • What is the size of numpy array? – Ashwin Geet D'Sa Dec 01 '20 at 11:16
  • @AshwinGeetD'Sa it is something like `[2]`, `[3]`. I know I can do this but is there any faster approach to this? Also, if I am not using any sampler, does this mean that My predictions are in sequence? – Deshwal Dec 01 '20 at 18:28
  • Usually, the default of the Dataloader has `shuffle=False`, hence they should be sequential – Ashwin Geet D'Sa Dec 01 '20 at 21:00
  • Don't you think `[2] or [3]` are correct class labels? – Ashwin Geet D'Sa Dec 01 '20 at 21:00
  • Yeah I got the expected accuracy. I am asking if there is a simple way to do this. Sometime s I have to test on batches and sometimes single. Is there a faster and efficient way to do this? – Deshwal Dec 02 '20 at 06:21
  • You can do it in higher batch size. I am not sure what's the exact problem you are facing. If you want to get the prediction for the single sentence, set `batch_size=1` – Ashwin Geet D'Sa Dec 02 '20 at 09:08

0 Answers0