0

I'm trying to run the code 'transformers' version of this code to use the new pre-trained BERTweet model and I'm getting an error.

The following lines of code ran successfully in my Google Colab notebook:


!pip install fairseq
import fairseq
!pip install fastBPE
import fastBPE

# download the pre-trained BERTweet model zipped file
!wget https://public.vinai.io/BERTweet_base_fairseq.tar.gz

# unzip the pre-trained BERTweet model files
!tar -xzvf BERTweet_base_fairseq.tar.gz

!pip install transformers
import transformers

import torch
import argparse

from transformers import RobertaConfig
from transformers import RobertaModel

from fairseq.data.encoders.fastbpe import fastBPE
from fairseq.data import Dictionary

Then I tried to run the following code:

# Load model
config = RobertaConfig.from_pretrained(
    "/Absolute-path-to/BERTweet_base_transformers/config.json"
)
BERTweet = RobertaModel.from_pretrained(
    "/Absolute-path-to/BERTweet_base_transformers/model.bin",
    config=config
)

...and an error was displayed:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
    242             if resolved_config_file is None:
--> 243                 raise EnvironmentError
    244             config_dict = cls._dict_from_json_file(resolved_config_file)

OSError: 

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
2 frames
/usr/local/lib/python3.6/dist-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
    250                 f"- or '{pretrained_model_name_or_path}' is the correct path to a directory containing a {CONFIG_NAME} file\n\n"
    251             )
--> 252             raise EnvironmentError(msg)
    253 
    254         except json.JSONDecodeError:

OSError: Can't load config for '/Absolute-path-to/BERTweet_base_transformers/config.json'. Make sure that:

- '/Absolute-path-to/BERTweet_base_transformers/config.json' is a correct model identifier listed on 'https://huggingface.co/models'

- or '/Absolute-path-to/BERTweet_base_transformers/config.json' is the correct path to a directory containing a config.json file

I'm guessing the issue is that I need to replace '/Absolute-path-to' with something else but if that's the case what should it be replaced with? It's likely a very simple answer and I feel stupid for asking but I need help.

cronoik
  • 15,434
  • 3
  • 40
  • 78
code_to_joy
  • 569
  • 1
  • 9
  • 27

1 Answers1

3

First of all you have to download the proper package as described in the github readme:

!wget https://public.vinai.io/BERTweet_base_transformers.tar.gz

!tar -xzvf BERTweet_base_transformers.tar.gz

After that you can click on the directory icon (left side of your screen) and list the downloaded data: colab folders

Right click on BERTweet_base_transformers, choose copy path and insert the content from your clipboard to your code:

config = RobertaConfig.from_pretrained(
    "/content/BERTweet_base_transformers/config.json"
)

BERTweet = RobertaModel.from_pretrained(
    "/content/BERTweet_base_transformers/model.bin",
    config=config
)
cronoik
  • 15,434
  • 3
  • 40
  • 78
  • 1
    Ahhh yes it looks like I have downloaded the fairseq files: ( !wget https://public.vinai.io/BERTweet_base_fairseq.tar.gz !tar -xzvf BERTweet_base_fairseq.tar.gz ) but not the transformers files. Thank you! – code_to_joy Jun 16 '20 at 13:36
  • I had a similar problem on Linux machine. It was because wrong permissions on the directory ~/.cache/huggingface where the models should be downloaded – Alexander Borochkin Dec 07 '21 at 15:08