BERT embeddings for abstractive text summarisation in Keras using encoder-decoder model

Question

I am working on a text summarization task using encoder-decoder architecture in Keras. I would like to test the model's performance using different word embeddings such as GloVe and BERT. I already tested it out with GloVe embeddings but could not find an appropriate example for BERT embeddings in seq2seq models using Keras. This is an excerpt of my code:

<...>
# splitting the data

from sklearn.model_selection import train_test_split
Xtrain, Xtest, ytrain, ytest = train_test_split(data['clean_texts'], data['clean_summaries'], 
                                            test_size=0.2,shuffle=True,random_state=0)
# prepare a tokenizer for inputs

tokenizer = Tokenizer()
tokenizer.fit_on_texts(Xtrain) 

X_train = tokenizer.texts_to_sequences(Xtrain)
X_test = tokenizer.texts_to_sequences(Xtest)

X_train = pad_sequences(X_train, maxlen= MAX_TEXT_LENGTH, padding='post')
X_test = pad_sequences(X_test, maxlen= MAX_TEXT_LENGTH, padding='post')

# prepare a tokenizer for outputs

y_tokenizer = Tokenizer()
y_tokenizer.fit_on_texts(ytrain) 

y_train = y_tokenizer.texts_to_sequences(ytrain)
y_test = y_tokenizer.texts_to_sequences(ytest)

y_train = pad_sequences(y_train, maxlen= MAX_SUM_LENGTH, padding='post')
y_test = pad_sequences(y_test, maxlen= MAX_SUM_LENGTH, padding='post')

Textvocab_size   =  len(tokenizer.word_index) + 1
Sumvocab_size  =   len(y_tokenizer.word_index) + 1 

# Encoder 

encoder_inputs = Input(shape=(MAX_TEXT,))
encoder_embedding = Embedding(Textvocab_size, LATENT_DIMENSION,trainable=True)(encoder_inputs) 

encoderlstm1 = Bidirectional(LSTM(LATENT_DIMENSION,return_sequences=True, return_state=True))
encoder_output1, forward_h1, forward_c1, backward_h1, backward_c1 = encoderlstm1(encoder_embedding)
state_h1 = Concatenate()([forward_h1, backward_h1])
state_c1 = Concatenate()([forward_c1, backward_c1])
encoder_states1 = [state_h1, state_c1]

<...>

How to add BERT word embeddings to such a model? I tried this implementation on my data frame before tokenization but I ran into an error:

AttributeError: 'str' object has no attribute 'device_typeid'

I could not find a solution to it. Are there any other ways how to simply add these word embeddings as GloVe?

score 0 · Answer 1 · answered Feb 01 '21 at 05:23

The error says that what is you system processor type whether it is GPU or CPU machine.

for me i am using bert embedding is used bert_embedding library

from bert_embedding import BertEmbedding
embedding = BertEmbedding('man')

error code: not initializing machine type

Change the code to

embedding = BertEmbedding()# for cpu
embed = embedding('man')
#or for gpu
import mxnet as mx
ctx = mx.gpu(0)
embedding = BertEmbedding(ctx=ctx)
embed = embedding('man')

BERT embeddings for abstractive text summarisation in Keras using encoder-decoder model

1 Answers1

Change the code to