I am working on a text summarization task using encoder-decoder architecture in Keras. I would like to test the model's performance using different word embeddings such as GloVe and BERT. I already tested it out with GloVe embeddings but could not find an appropriate example for BERT embeddings in seq2seq models using Keras. This is an excerpt of my code:
<...>
# splitting the data
from sklearn.model_selection import train_test_split
Xtrain, Xtest, ytrain, ytest = train_test_split(data['clean_texts'], data['clean_summaries'],
test_size=0.2,shuffle=True,random_state=0)
# prepare a tokenizer for inputs
tokenizer = Tokenizer()
tokenizer.fit_on_texts(Xtrain)
X_train = tokenizer.texts_to_sequences(Xtrain)
X_test = tokenizer.texts_to_sequences(Xtest)
X_train = pad_sequences(X_train, maxlen= MAX_TEXT_LENGTH, padding='post')
X_test = pad_sequences(X_test, maxlen= MAX_TEXT_LENGTH, padding='post')
# prepare a tokenizer for outputs
y_tokenizer = Tokenizer()
y_tokenizer.fit_on_texts(ytrain)
y_train = y_tokenizer.texts_to_sequences(ytrain)
y_test = y_tokenizer.texts_to_sequences(ytest)
y_train = pad_sequences(y_train, maxlen= MAX_SUM_LENGTH, padding='post')
y_test = pad_sequences(y_test, maxlen= MAX_SUM_LENGTH, padding='post')
Textvocab_size = len(tokenizer.word_index) + 1
Sumvocab_size = len(y_tokenizer.word_index) + 1
# Encoder
encoder_inputs = Input(shape=(MAX_TEXT,))
encoder_embedding = Embedding(Textvocab_size, LATENT_DIMENSION,trainable=True)(encoder_inputs)
encoderlstm1 = Bidirectional(LSTM(LATENT_DIMENSION,return_sequences=True, return_state=True))
encoder_output1, forward_h1, forward_c1, backward_h1, backward_c1 = encoderlstm1(encoder_embedding)
state_h1 = Concatenate()([forward_h1, backward_h1])
state_c1 = Concatenate()([forward_c1, backward_c1])
encoder_states1 = [state_h1, state_c1]
<...>
How to add BERT word embeddings to such a model? I tried this implementation on my data frame before tokenization but I ran into an error:
AttributeError: 'str' object has no attribute 'device_typeid'
I could not find a solution to it. Are there any other ways how to simply add these word embeddings as GloVe?