AttributeError: 'BloomForCausalLM' object has no attribute 'encode'

Question

I'm trying to do some basic text inference using the bloom model

from transformers import AutoModelForCausalLM, AutoModel

# checkpoint = "bigscience/bloomz-7b1-mt"
checkpoint = "bigscience/bloom-1b7"

tokenizer = AutoModelForCausalLM.from_pretrained(checkpoint)

model = AutoModel.from_pretrained(checkpoint, torch_dtype="auto", device_map="auto")

# Set the prompt and maximum length
prompt = "This is the prompt text"
max_length = 100000

# Tokenize the prompt
inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda")

# Generate the text
outputs = model.generate(inputs)

result = tokenizer.result(outputs[0])

# Print the generated text
print(result)

I get the error

Traceback (most recent call last):
  File "/tmp/pycharm_project_444/bloom.py", line 15, in <module>
    inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda")
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1265, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'BloomForCausalLM' object has no attribute 'encode'

Anyone know what the issue is?

It's running on a remote server

score 0 · Answer 1 · answered Dec 12 '22 at 06:07

0

I was trying to use the AutoModelForCausalLM tokenizer instead of the AutoTokenizer.

The AutoModelForCausalLMTokenizer does not have an encode() method

answered Dec 12 '22 at 06:07

Tobi Akinyemi

804
1
8
24

score 0 · Answer 2 · answered Jan 01 '23 at 20:25

The problem was that you were using a model class to create your tokenizer.

AutoModelForCausalLM loads a model for causal language modelling (LM) but not a tokenizer, as stated in the documentation. (https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForCausalLM) You got that error because the model does not have a method called "encode".

You can use AutoTokenizer to achieve what you want. Also, in huggingface every tokenizer's call is mapped to the encoding method, so you do not have to specify the method call. See below

from transformers import AutoTokenizer, AutoModel
checkpoint = "bigscience/bloom-1b7"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModel.from_pretrained(checkpoint, torch_dtype="auto", device_map="auto")

# Set the prompt and maximum length
prompt = "This is the prompt text"
max_length = 100000

# Tokenize the prompt
inputs = tokenizer("Translate to English: Je t’aime.", return_tensors="pt").to("cuda") # Same as tokenizer.encode(...)

AttributeError: 'BloomForCausalLM' object has no attribute 'encode'

2 Answers2