I tried to fine-tune this model I found on huggingface (https://github.com/flexudy-pipe/sentence-doctor) in order to make it more performant with french, however, I have a problem.
I did use the train_any_t5_task.py
file the author gave (https://github.com/flexudy-pipe/sentence-doctor/blob/master/train_any_t5_task.py) in order to fine-tune the model.
After a few modifications, I made it run, and it actually gave me a model.
However, when I try to use this model with the inference code provided by the author, I always get an error (I tried it both on Google Colab and locally).
Here is the code I ran:
from transformers import AutoTokenizer, AutoModelWithLMHead
#this is the path to my model
tokenizer = AutoTokenizer.from_pretrained("D:\model\\t5-base-multi-your-sentence-doctor", local_files_only=True) #used local_files_only just when I tried to run it locally
model = AutoModelWithLMHead.from_pretrained("D:\model\\t5-base-multi-your-sentence-doctor", local_files_only=True)
#french sentence that need to be repaired
input_text = "repair_sentence: j\ sui malade"
input_ids = tokenizer.encode(input_text, return_tensors="pt")
outputs = model.generate(input_ids, max_length=32, num_beams=1)
sentence = tokenizer.decode(outputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
Here is the error I always get:
RuntimeError: Internal: C:\projects\sentencepiece\src\sentencepiece_processor.cc(891)
[model_proto->ParseFromArray(serialized.data(), serialized.size())]
By the way, here is the link to the google collab, you will find there the code I ran in order to train a new model: https://colab.research.google.com/drive/1jRNgVESZh-o42o0OzpNN51JgyI74diaY?usp=sharing
Can someone help me with that, please?