I am creating a deep learning code that embeds text into BERT based embedding. I am seeing unexpected issues in a code that was working fine before. Below is the snippet:
sentences = ["person in red riding a motorcycle", "lady cutting cheese with reversed knife"]
# Embed text using BERT model.
text_tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased', cache_dir="cache/")
model = DistilBertModel.from_pretrained('distilbert-base-uncased')
print(text_tokenizer.tokenize(sentences[0]))
inputs = text_tokenizer(sentences, return_tensors="pt", padding=True) # error comes here
Error is below:
['person', 'in', 'red', 'riding', 'a', 'motorcycle']
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/amitgh/PycharmProjects/682_image_caption_errors/model/model.py", line 92, in <module>
load_data()
File "/Users/amitgh/PycharmProjects/682_image_caption_errors/model/model.py", line 59, in load_data
inputs = text_tokenizer(sentences, return_tensors="pt", padding=True)
TypeError: 'DistilBertTokenizer' object is not callable
As you can see text_tokenizer.tokenize()
works fine. I tried force downloading the tokenizer and even changing the cache directory but to no good effect.
The code runs fine in some other machine (friend's laptop) and was also working fine in my some time back before I tried installing torchvision and using PIL library for image part. Now it's not somehow always giving this error.
OS: MacOS 11.6, using Conda environment, python=3.9