I trained two neural networks with Keras
: a MLP
and a Bidirectional LSTM
.
My task is to predict the words order in a sentence, so for each word, the neural network has to output a real number. When a sentence with N words is processed, the N reals number in the output are ranked in order to obtain integer numbers representing words position.
I'm using same dataset and same preprocessing on the dataset. The only different thing is that in the LSTM
dataset I added padding to get the sequences of the same length.
In the prediction phase, with LSTM
, I exclude the predictions created from padding vectors, since I masked them in the training phase.
MLP architecture:
mlp = keras.models.Sequential()
# add input layer
mlp.add(
keras.layers.Dense(
units=training_dataset.shape[1],
input_shape = (training_dataset.shape[1],),
kernel_initializer=keras.initializers.RandomUniform(minval=-0.05, maxval=0.05, seed=None),
activation='relu')
)
# add hidden layer
mlp.add(
keras.layers.Dense(
units=training_dataset.shape[1] + 10,
input_shape = (training_dataset.shape[1] + 10,),
kernel_initializer=keras.initializers.RandomUniform(minval=-0.05, maxval=0.05, seed=None),
bias_initializer='zeros',
activation='relu')
)
# add output layer
mlp.add(
keras.layers.Dense(
units=1,
input_shape = (1, ),
kernel_initializer=keras.initializers.RandomUniform(minval=-0.05, maxval=0.05, seed=None),
bias_initializer='zeros',
activation='linear')
)
Bidirection LSTM architecture:
model = tf.keras.Sequential()
model.add(Masking(mask_value=0., input_shape=(timesteps, features)))
model.add(Bidirectional(LSTM(units=20, return_sequences=True), input_shape=(timesteps, features)))
model.add(Dropout(0.2))
model.add(Dense(1, activation='linear'))
The task is much better solvable with an LSTM, which should capture dependencies between words well.
However, with the MLP
I achieve good results, but with LSTM
the results are very bad.
Since I'm a beginner, could someone understand what is wrong with my LSTM
architecture? I'm going out of head.
Thanks in advance.