5

I am pretty new to Hugging-Face transformers. I am facing the following issue when I try to load xlm-roberta-base model from a given path:

>> tokenizer = AutoTokenizer.from_pretrained(model_path)
>> Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_auto.py", line 182, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 309, in from_pretrained
    return cls._from_pretrained(*inputs, **kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 458, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_roberta.py", line 98, in __init__
    **kwargs,
  File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_gpt2.py", line 133, in __init__
    with open(vocab_file, encoding="utf-8") as vocab_handle:
TypeError: expected str, bytes or os.PathLike object, not NoneType

However, if I load it by its name, there is no problem:

>> tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-base')

I would appreciate any help.

Spartan
  • 51
  • 1
  • 3

3 Answers3

2

I assume you have created that directory as described in the documentation with :

tokenizer.save_pretrained('YOURPATH')

There is currently an issue under investigation which only affects the AutoTokenizers but not the underlying tokenizers like (XLMRobertaTokenizer). For example the following should work:

from transformers import XLMRobertaTokenizer

tokenizer = XLMRobertaTokenizer.from_pretrained('YOURPATH')

To work with the AutoTokenizer you also need to save the config to load it offline:

from transformers import AutoTokenizer, AutoConfig

tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-base')
config = AutoConfig.from_pretrained('xlm-roberta-base')

tokenizer.save_pretrained('YOURPATH')
config.save_pretrained('YOURPATH')

tokenizer = AutoTokenizer.from_pretrained('YOURPATH')

I recommend to either use a different path for the tokenizers and the model or to keep the config.json of your model because some modifications you apply to your model will be stored in the config.json which is created during model.save_pretrained() and will be overwritten when you save the tokenizer as described above after your model (i.e. you won't be able to load your modified model with tokenizer config.json).

cronoik
  • 15,434
  • 3
  • 40
  • 78
0

I encountered the same error message, to fix it, you can add use_fast=True in the arguments.

generator = AutoTokenizer.from_pretrained(generator_path, config=config.generator, use_fast=True) 
Aslizy
  • 1
0

I encountered the same problem. To use models from the local machine.

os.environ['TRANSFORMERS_OFFLINE']='1'

This tells the library to use local files only. You can read more about it on Hugging Face Installation - Offline Mode

from transformers import RobertaTokenizer
tokenizer = RobertaTokenizer.from_pretrained('Model_Path')

The path should be the location path of the model folder from the current file directory. For example, if model files are in the models folder under the xlm-roberta-base folder path should be 'models/xlm-roberta-base/'