4

I tried to fine-tune this model I found on huggingface (https://github.com/flexudy-pipe/sentence-doctor) in order to make it more performant with french, however, I have a problem.

I did use the train_any_t5_task.py file the author gave (https://github.com/flexudy-pipe/sentence-doctor/blob/master/train_any_t5_task.py) in order to fine-tune the model. After a few modifications, I made it run, and it actually gave me a model.

However, when I try to use this model with the inference code provided by the author, I always get an error (I tried it both on Google Colab and locally).

Here is the code I ran:

from transformers import AutoTokenizer, AutoModelWithLMHead

#this is the path to my model
tokenizer = AutoTokenizer.from_pretrained("D:\model\\t5-base-multi-your-sentence-doctor", local_files_only=True) #used local_files_only just when I tried to run it locally

model = AutoModelWithLMHead.from_pretrained("D:\model\\t5-base-multi-your-sentence-doctor", local_files_only=True)

#french sentence that need to be repaired
input_text = "repair_sentence: j\ sui malade"

input_ids = tokenizer.encode(input_text, return_tensors="pt")

outputs = model.generate(input_ids, max_length=32, num_beams=1)

sentence = tokenizer.decode(outputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)

Here is the error I always get:

RuntimeError: Internal: C:\projects\sentencepiece\src\sentencepiece_processor.cc(891)
 [model_proto->ParseFromArray(serialized.data(), serialized.size())]

By the way, here is the link to the google collab, you will find there the code I ran in order to train a new model: https://colab.research.google.com/drive/1jRNgVESZh-o42o0OzpNN51JgyI74diaY?usp=sharing

Can someone help me with that, please?

desertnaut
  • 57,590
  • 26
  • 140
  • 166

0 Answers0