I'm just trying to train a LSTM to reverse a integer sequence. My approach is a modified version of this tutorial, in which he just echoes the input sequence. It goes like this:
- Generate a random sequence S with length R (possible values range from 0 to 99)
- Break the sentence above in sub-sequences of lenght L (moving window)
- Each sub-sequence has it's reverse as truth label
So, this will generate (R - L + 1) sub-sequences, which is a input matrix of shape (R - L + 1) x L. For example, using:
S = 1 2 3 4 5 ... 25 (1 to 25)
R = 25
L = 5
We endup with 21 sentences:
s1 = 1 2 3 4 5, y1 = 5 4 3 2 1
s2 = 2 3 4 5 6, y2 = 6 5 4 3 2
...
s21 = 21 22 23 24 25, y21 = 25 24 23 22 21
This input matrix is then one-hot-encoded and feed to keras. Then I repeat the proccess for another sequence. The problem is that it does not converge, the accuracy is very low. What I'm doing wrong?
In the code below I use R = 500 and L = 5, which gives 496 sub-sequences, with batch_size = 16 (so we have 31 updates per 'training session'):
Here's the code:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import TimeDistributed
from keras.layers import LSTM
from random import randint
from keras.utils.np_utils import to_categorical
import numpy as np
def one_hot_encode(sequence, n_unique=100):
encoding = list()
for value in sequence:
vector = [0 for _ in range(n_unique)]
vector[value] = 1
encoding.append(vector)
return np.array(encoding)
def one_hot_decode(encoded_seq):
return [np.argmax(vector) for vector in encoded_seq]
def get_data(rows = 500, length = 5, n_unique=100):
s = [randint(0, n_unique-1) for i in range(rows)]
x = []
y = []
for i in range(0, rows-length + 1, 1):
x.append(one_hot_encode(s[i:i+length], n_unique))
y.append(one_hot_encode(list(reversed(s[i:i+length])), n_unique))
return np.array(x), np.array(y)
N = 50000
LEN = 5
#ROWS = LEN*LEN - LEN + 1
TIMESTEPS = LEN
ROWS = 10000
FEATS = 10 #randint
BATCH_SIZE = 588
# fit model
model = Sequential()
model.add(LSTM(100, batch_input_shape=(BATCH_SIZE, TIMESTEPS, FEATS), return_sequences=True, stateful=True))
model.add(TimeDistributed(Dense(FEATS, activation='softmax')))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
print(model.summary())
# train LSTM
for epoch in range(N):
# generate new random sequence
X,y = get_data(500, LEN, FEATS)
# fit model for one epoch on this sequence
model.fit(X, y, epochs=1, batch_size=BATCH_SIZE, verbose=2, shuffle=False)
model.reset_states()
# evaluate LSTM
X,y = get_data(500, LEN, FEATS)
yhat = model.predict(X, batch_size=BATCH_SIZE, verbose=0)
# decode all pairs
for i in range(len(X)):
print('Expected:', one_hot_decode(y[i]), 'Predicted', one_hot_decode(yhat[i]))
Thanks!
Edit: It seems like the last numbers of the sequence are being picked up:
Expected: [7, 3, 7, 7, 6] Predicted [3, 9, 7, 7, 6]
Expected: [6, 7, 3, 7, 7] Predicted [4, 6, 3, 7, 7]
Expected: [6, 6, 7, 3, 7] Predicted [4, 3, 7, 3, 7]
Expected: [1, 6, 6, 7, 3] Predicted [3, 3, 6, 7, 3]
Expected: [8, 1, 6, 6, 7] Predicted [4, 3, 6, 6, 7]
Expected: [8, 8, 1, 6, 6] Predicted [3, 3, 1, 6, 6]
Expected: [9, 8, 8, 1, 6] Predicted [3, 9, 8, 1, 6]
Expected: [5, 9, 8, 8, 1] Predicted [3, 3, 8, 8, 1]
Expected: [9, 5, 9, 8, 8] Predicted [7, 7, 9, 8, 8]
Expected: [0, 9, 5, 9, 8] Predicted [7, 9, 5, 9, 8]
Expected: [7, 0, 9, 5, 9] Predicted [5, 7, 9, 5, 9]
Expected: [1, 7, 0, 9, 5] Predicted [7, 9, 0, 9, 5]
Expected: [9, 1, 7, 0, 9] Predicted [5, 9, 7, 0, 9]
Expected: [4, 9, 1, 7, 0] Predicted [6, 3, 1, 7, 0]
Expected: [4, 4, 9, 1, 7] Predicted [4, 3, 9, 1, 7]
Expected: [0, 4, 4, 9, 1] Predicted [3, 9, 4, 9, 1]
Expected: [1, 0, 4, 4, 9] Predicted [5, 5, 4, 4, 9]
Expected: [3, 1, 0, 4, 4] Predicted [3, 3, 0, 4, 4]
Expected: [0, 3, 1, 0, 4] Predicted [3, 3, 1, 0, 4]
Expected: [2, 0, 3, 1, 0] Predicted [6, 3, 3, 1, 0]