0

I tried to use distilbert classifier. but I am getting the following error.

This is my code

(X_train,y_train),(X_test,y_test),prepro 
=text.texts_from_df(train_df=data_train,text_column="Cleaned",label_columns=col

,val_df=data_test,maxlen=500,preprocess_mode="distilbert")

and here is the error

OSError: Model name 'distilbert-base-uncased' was not found in tokenizers model name list (distilbert-base-uncased, distilbert-base-uncased-distilled-squad, distilbert-base-cased, distilbert-base-cased-distilled-squad, distilbert-base-german-cased, distilbert-base-multilingual-cased). We assumed 'distilbert-base-uncased' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url._

Due to my office current environmental issue, I can only work on tf 2.2 and python 3.8. Right now I am using 0.19.

Do you think it will affect my current environment if I downgrade it to 0.16?

1 Answers1

0

This error may happen if there is a network or firewall issue preventing download of the tokenizer files. See this FAQ entry for remedies.

Also, when you use preprocess_mode='distilbert', texts_from* functions return TransformerDataset instances, not arrays. You'll need to replace (X_train, y_train) with train_data, for example. See this example notebook.

blustax
  • 449
  • 3
  • 4