With this code:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# https://huggingface.co/Helsinki-NLP/opus-mt-fr-en
# https://huggingface.co/Helsinki-NLP/opus-mt-en-fr
tokenizer_fr_en = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-fr-en")
model_fr_en = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-fr-en")
fr_text = "Le soleil brille, c'est une belle journée pour se promener."
# Tokenize text
tokenized_text = tokenizer_fr_en.prepare_seq2seq_batch([fr_text], return_tensors='pt')
# Perform translation and decode the output
translation = model_fr_en.generate(**tokenized_text)
translated_text = tokenizer_fr_en.batch_decode(translation, skip_special_tokens=True)[0]
print("Input_phrase: ", fr_text)
print("Translation: ", translated_text)
I get:
/usr/local/lib/python3.9/dist-packages/transformers/tokenization_utils_base.py:3712: FutureWarning:
prepare_seq2seq_batch
is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular__call__
method to prepare your inputs and targets.Here is a short example:
model_inputs = tokenizer(src_texts, text_target=tgt_texts, ...)
If you either need to use different keyword arguments for the source and target texts, you should do two calls like this:
model_inputs = tokenizer(src_texts, ...) labels = tokenizer(text_target=tgt_texts, ...) model_inputs["labels"] = labels["input_ids"]
See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice. For a more complete example, see the implementation of
prepare_seq2seq_batch
.
It's hard for me to act because I'm not sure what regular __call__
method means.
Can you suggest how to replace prepare_seq2seq_batch
?
Thank you.