4

For my use case, I need to use the model.forward() instead of the model.generate() method i.e instead of the below code

outs = model.model.generate(input_ids=batch['source_ids'],
                                 attention_mask=batch['source_mask'],
                                 output_scores=True,
                                 max_length=model.model_arguments.max_output_seq_length)

preds_cleaned = [model.tokenizer.decode(ids, skip_special_tokens=True, clean_up_tokenization_spaces=True) for ids in outs]

I need to use

model_outputs = model.model(
            input_ids=batch["source_ids"],
            attention_mask=batch["source_mask"],
            labels=lm_labels.to(device),
            decoder_attention_mask=batch['target_mask']
        )
logits = model_outputs.logits
softmax_logits = m(logits)
max_logits = torch.max(softmax_logits, dim=2)

    

decoding these logits gives unprocessed text that has many issues like repetition of words at the end etc. What do I need to do to get the same result as model.generate() ?

NRJ_Varshney
  • 137
  • 9

1 Answers1

11

The two methods do something completely different.

Calling the model (which means the forward method) uses the labels for teacher forcing. This means inputs to the decoder are the labels shifted by one (see documentation). With teacher forcing, the decoder always gets the ground-truth token in the next step, no matter what the prediction was. Teacher forcing is used from model training, all steps are fully differentiable.

When you call the generate method, the model is used in the autoregressive fashion. Any token it generates is put as the input in the next step. However, selecting the token is a "hard" decision, and the gradient cannot be propagated through this decision. The generate method cannot be used for training. The output is coherent because the decoder reacts to what was previously generated.

With teacher forcing, the model might want to prefer generating a token and continue consistently with the generated token. However, it cannot continue consistently, because it is forced to continue as if it generated the token that actually is in the labels argument. This why you observe the incoherent output (which was nevertheless never intended to be output but only to be used for training).

Jindřich
  • 10,270
  • 2
  • 23
  • 44
  • Thanks for the response. That explains a lot. So, essentially for my usecase, I need to use inputs_embeds instead of input_ids while generating from the T5 model. model.model() supports that but generate() method doesn't. How can I do that? – NRJ_Varshney Apr 30 '21 at 14:55
  • @jindrich, can you please check my question https://stackoverflow.com/questions/72177055/forward-outputs-on-multiple-sequences-is-wrong?noredirect=1#comment127536983_72177055 – LearnToGrow May 11 '22 at 15:05
  • @jindřich does it mean that when we use a forward pass with a label it is not auto-regressive? – Saeed Rahmani Aug 01 '23 at 07:25