15

I'm following this tutorial that codes a sentiment analysis classifier using BERT with the huggingface library and I'm having a very odd behavior. When trying the BERT model with a sample text I get a string instead of the hidden state. This is the code I'm using:

import transformers
from transformers import BertModel, BertTokenizer

print(transformers.__version__)

PRE_TRAINED_MODEL_NAME = 'bert-base-cased'
PATH_OF_CACHE = "/home/mwon/data-mwon/paperChega/src_classificador/data/hugingface"

tokenizer = BertTokenizer.from_pretrained(PRE_TRAINED_MODEL_NAME,cache_dir = PATH_OF_CACHE)

sample_txt = 'When was I last outside? I am stuck at home for 2 weeks.'

encoding_sample = tokenizer.encode_plus(
  sample_txt,
  max_length=32,
  add_special_tokens=True, # Add '[CLS]' and '[SEP]'
  return_token_type_ids=False,
  padding=True,
  truncation = True,
  return_attention_mask=True,
  return_tensors='pt',  # Return PyTorch tensors
)

bert_model = BertModel.from_pretrained(PRE_TRAINED_MODEL_NAME,cache_dir = PATH_OF_CACHE)


last_hidden_state, pooled_output = bert_model(
  encoding_sample['input_ids'],
  encoding_sample['attention_mask']
)

print([last_hidden_state,pooled_output])

that outputs:

4.0.0
['last_hidden_state', 'pooler_output']
 
Miguel
  • 2,738
  • 3
  • 35
  • 51

2 Answers2

14

While the answer from Aakash provides a solution to the problem, it does not explain the issue. Since one of the 3.X releases of the transformers library, the models do not return tuples anymore but specific output objects:

o = bert_model(
    encoding_sample['input_ids'],
    encoding_sample['attention_mask']
)
print(type(o))
print(o.keys())

Output:

transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions
odict_keys(['last_hidden_state', 'pooler_output'])

You can return to the previous behavior by adding return_dict=False to get a tuple:

o = bert_model(
   encoding_sample['input_ids'],
   encoding_sample['attention_mask'],
   return_dict=False
)

print(type(o))

Output:

<class 'tuple'>

I do not recommend that, because it is now unambiguous to select a specific part of the output without turning to the documentation as shown in the example below:

o = bert_model(encoding_sample['input_ids'],  encoding_sample['attention_mask'], return_dict=False, output_attentions=True, output_hidden_states=True)
print('I am a tuple with {} elements. You do not know what each element presents without checking the documentation'.format(len(o)))

o = bert_model(encoding_sample['input_ids'],  encoding_sample['attention_mask'], output_attentions=True, output_hidden_states=True)
print('I am a cool object and you can acces my elements with o.last_hidden_state, o["last_hidden_state"] or even o[0]. My keys are; {} '.format(o.keys()))

Output:

I am a tuple with 4 elements. You do not know what each element presents without checking the documentation
I am a cool object and you can acces my elements with o.last_hidden_state,  o["last_hidden_state"] or even o[0]. My keys are; odict_keys(['last_hidden_state', 'pooler_output', 'hidden_states', 'attentions']) 
cronoik
  • 15,434
  • 3
  • 40
  • 78
  • 1
    Indeed, I recommend always using `return_dict=True` so that the outputs can be retrieved unambiguously from the dictionary returned by the model. – stackoverflowuser2010 Dec 10 '20 at 05:04
  • How to decode the output of bertmodel to get the sentence or string? – shaik moeed Jun 16 '21 at 07:41
  • The output of the bert_model is just a contextualized representation of your input and the sentence is still the same. You can simply perform `tokenizer.decode(input_ids)`. In case you have a different layer on top of bert, this is different. Please open your own question in that case. @shaikmoeed – cronoik Jun 17 '21 at 09:12
  • @cronoik, can you share how to convert model output back to text? I assume this output contains features in tensor form and I would like to see it as a text – Gleichmut Aug 16 '23 at 10:09
10

I faced the same issue while learning how to implement Bert. I noticed that using

last_hidden_state, pooled_output = bert_model(encoding_sample['input_ids'], encoding_sample['attention_mask'])

is the issue. Use:

outputs = bert_model(encoding_sample['input_ids'], encoding_sample['attention_mask'])

and extract the last_hidden state using

output[0]

You can refer to the documentation here which tells you what is returned by the BertModel

  • How to convert model output back to text? I assume this output contains features in tensor form and I would like to see it as a text – Gleichmut Aug 16 '23 at 10:08