0

With this code:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# https://huggingface.co/Helsinki-NLP/opus-mt-fr-en
# https://huggingface.co/Helsinki-NLP/opus-mt-en-fr

tokenizer_fr_en = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-fr-en")
model_fr_en = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-fr-en")

fr_text = "Le soleil brille, c'est une belle journée pour se promener."

# Tokenize text
tokenized_text = tokenizer_fr_en.prepare_seq2seq_batch([fr_text], return_tensors='pt')

# Perform translation and decode the output
translation = model_fr_en.generate(**tokenized_text)
translated_text = tokenizer_fr_en.batch_decode(translation, skip_special_tokens=True)[0]

print("Input_phrase: ", fr_text)
print("Translation: ", translated_text)

I get:

/usr/local/lib/python3.9/dist-packages/transformers/tokenization_utils_base.py:3712: FutureWarning: prepare_seq2seq_batch is deprecated and will be removed in version 5 of HuggingFace Transformers. Use the regular __call__ method to prepare your inputs and targets.

Here is a short example:

model_inputs = tokenizer(src_texts, text_target=tgt_texts, ...)

If you either need to use different keyword arguments for the source and target texts, you should do two calls like this:

model_inputs = tokenizer(src_texts, ...) labels = tokenizer(text_target=tgt_texts, ...) model_inputs["labels"] = labels["input_ids"]

See the documentation of your specific tokenizer for more details on the specific arguments to the tokenizer of choice. For a more complete example, see the implementation of prepare_seq2seq_batch.

It's hard for me to act because I'm not sure what regular __call__ method means.

Can you suggest how to replace prepare_seq2seq_batch?

Thank you.

LeMoussel
  • 5,290
  • 12
  • 69
  • 122
  • You might just replace it with `tokenized_text = tokenizer_fr_en([fr_text], return_tensors='pt')`. Method `__call__` (if implemented) gives the instances the ability to behave like functions. – amiola Mar 22 '23 at 15:43

0 Answers0