I was trying to implement a sequence to sequence language model. During training process, the model takes in a sequence of 50d word vectors generated by GloVe, and output 1-to-V(V is the size of vocabulary) vector meaning the next word which thus can be regarded as the distribution of next word corresponding to the input word vector at current timestep in test process, and I tried with a 112-word vocabulary.
Then, I built two models as following:
model1 = Sequential()
model1.add(LSTM(112, return_sequences=True, input_shape=(31, 50)))
model2 = Sequential()
model2.add(LSTM(112, return_sequences=True, input_shape=(31, 50)))
model2.add(TimeDistributed(Dense(112, activation="linear")))
When I tried to fit them by
model.fit(X, Y, batch_size=128, nb_epoch=256, validation_rate=0.1)
The first model model1
crashed and raised MemoryError, but the second model model2
normally finished. X has the shape of (number_of_sentences, max_words_in_one_sentence, 50)
, and Y has the shape of (number_of_sentences, max_words_in_one_sentence, 112)
. In this example, number_of_sentences=10000, max_words_in_one_sentence=13
.
I am wondering what happened when I appended a new time-distributed-dense to a LSTM layer, and which one is the model I want to implement my language model.