how to use sentence bert with transformers and torch

Question

I would like to use sentence_transformers
But due to policy restrictions I cannot install the package sentence-transformers

I have transformers and torch package though.

I went to this page and tried to run the below code

Before doing that I went to the page and downloaded all the files

import os
path="/yz/sentence-transformers/multi-qa-mpnet-base-dot-v1/" #local path where I have stored files
os.listdir(path)

['.dominokeep',
 'config.json',
 'data_config.json',
 'modules.json',
 'sentence_bert_config.json',
 'special_tokens_map.json',
 'tokenizer_config.json',
 'train_script.py',
 'vocab.txt',
 'tokenizer.json',
 'config_sentence_transformers.json',
 'README.md',
 'gitattributes',
 '9e1e76b7a067f72e49c7f571cd8e811f7a1567bec49f17e5eaaea899e7bc2c9e']

The code that I ran is

from transformers import AutoTokenizer, AutoModel
import torch

# Load model from HuggingFace Hub

path="/yz/sentence-transformers/multi-qa-mpnet-base-dot-v1/"

"""tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/multi-qa-mpnet-base-dot-v1")
model = AutoModel.from_pretrained("sentence-transformers/multi-qa-mpnet-base-dot-v1")"""

tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModel.from_pretrained(path)

The error that I get is as below

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-18-bb33f7c519e0> in <module>
     32 model = AutoModel.from_pretrained("sentence-transformers/multi-qa-mpnet-base-dot-v1")"""
     33 
---> 34 tokenizer = AutoTokenizer.from_pretrained(path)
     35 model = AutoModel.from_pretrained(path)
     36 

/usr/local/anaconda/lib/python3.6/site-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
    308         config = kwargs.pop("config", None)
    309         if not isinstance(config, PretrainedConfig):
--> 310             config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
    311 
    312         if "bert-base-japanese" in str(pretrained_model_name_or_path):

/usr/local/anaconda/lib/python3.6/site-packages/transformers/models/auto/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    342 
    343         if "model_type" in config_dict:
--> 344             config_class = CONFIG_MAPPING[config_dict["model_type"]]
    345             return config_class.from_dict(config_dict, **kwargs)
    346         else:

KeyError: 'mpnet'

my questions:

How should I fix this error?
is there a way to use the same method for MiniLM-L6-H384-uncased- . I would like to use it as seems to be faster

============================== package versions as below -

transformers - 4.0.0
torch - 1.4.0

would share my transformers version soon. Were you able to get MiniLM-L6-H384-uncased to work? — user2543622, Oct 23 '21 at 17:02
package versions are `transformers - 4.0.0 and torch - 1.4.0`...which version of transformers are you using? — user2543622, Oct 23 '21 at 22:15
MPnet as added with transformers 4.1.0. Can you upgrade your package? i haven't tried it, but `MiniLM-L6-H384-uncased` seems to be a BERT and you should be able to load it with 4.0.0. — cronoik, Oct 24 '21 at 11:34
could you try `MiniLM-L6-H384-uncased`? having issues with it...I might not be able to update my package and `MiniLM-L6-H384-uncased` seems to be the only option.... I dont remember now but i think i was able to get the only tokenizer working for it...`model = AutoModel.from_pretrained(path)` failed :(. — user2543622, Oct 24 '21 at 14:42
You are right, you receive an error message since the pytorch-model.bin was created with a new version: `RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /pytorch/caffe2/serialize/inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:132)`. Maybe you can look if someone created a conversion script. — cronoik, Oct 24 '21 at 15:45

bitbang · Answer 1 · 2021-10-30T23:19:58.560

Answer is easy, you can not use "MiniLM-L6-H384-uncased" model with pytorch 1.4.0

print(torch.__version__)
# 1.4.0

torch.load("/content/MiniLM-L6-H384-uncased/pytorch_model.bin", location="cpu")

"""RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED 
at /pytorch/caffe2/serialize/inline_container.cc:132, please report a bug to 
PyTorch. Attempted to read a PyTorch file with version 3, but the maximum 
supported version for reading is 2. Your PyTorch installation may be too old. 
(init at /pytorch/caffe2/serialize/inline_container.cc:132)"""

how to use sentence bert with transformers and torch

1 Answers1