How to add encoder's last hidden state to GPT2 as encoder-decoder attention?

Asked Jan 31 '23 at 19:15

Active Jan 31 '23 at 19:15

Viewed 250 times

I have a BERT-based encoder model (encoder) and I want to input the last hidden state output of this to a GPT2-based model (decoder). There are no options in transformers.GPT2Config to use encoder's last hidden layer as input to GPT2. How do I achieve this?

I want something like this:

inputs = input_ids, token_type_ids, labels, attention_mask

encoder           = RobertaForMaskedLM(config=encoder_config)
encoder_output    = encoder(**inputs)
last_hidden_layer = encoder_output.hidden_states[-1]

decoder           = GPT2LMHeadModel(config=decoder_config)
decoder_output    = decoder(**inputs, last_hidden_layer)

where the last_hidden_layer is used as encoder-decoder attention to each transformer unit in GPT2.

asked Jan 31 '23 at 19:15

MNK

How to add encoder's last hidden state to GPT2 as encoder-decoder attention?

0 Answers0