I am using Keras with theano backend for online handwriting recognition problem as solved in this paper: http://papers.nips.cc/paper/3213-unconstrained-on-line-handwriting-recognition-with-recurrent-neural-networks.pdf.
I followed the Keras image ocr example https://github.com/keras-team/keras/blob/master/examples/image_ocr.py and modified the code for online handwriting samples instead of image samples. On training with a dataset of 842 text lines for 200 epochs that takes ~6 minutes per epoch, the CTC logloss decreases just after the first epoch but stays constant for all remaining epochs. I have tried with different optimizers (sgd, adam, adadelta) and learning rates(0.01,0.1,0.2) as well but there is hardly any variation in loss.
x_train.shape=(842,1263,4) [842 text lines with 1263 stroke points in 4 dimension]
y_train.shape=(842,64) [842 text lines with 64 max_len characters per line]
types of labels (len_alphabet)= 66
Code snapshot:
size=x_train.shape[0]
trainable=True
inputs = Input(name='the_input', shape=x_train.shape[1:], dtype='float32')
rnn_encoded = Bidirectional(GRU(64, return_sequences=True),
name='bidirectional_1',
merge_mode='concat',trainable=trainable)(inputs)
birnn_encoded = Bidirectional(GRU(64, return_sequences=True),
name='bidirectional_2',
merge_mode='concat',trainable=trainable)(rnn_encoded)
output = TimeDistributed(Dense(66, activation='softmax'))(birnn_encoded)
y_pred = Activation('softmax', name='softmax')(output)
labels = Input(name='the_labels', shape=[max_len], dtype='int32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')
loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred,labels, input_length, label_length])
model = Model(inputs=[inputs, labels, input_length, label_length], outputs=loss_out)
model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer='Adadelta')
absolute_max_string_len=max_len
blank_label=len(alphabet)+1
labels = np.ones([size, absolute_max_string_len])
input_length = np.zeros([size, 1])
label_length = np.zeros([size, 1])
source_str = []
for i in range (x_train.shape[0]):
labels[i, :] = y_train[i]
input_length[i] = x_train.shape[1]
label_length[i] =len(y_train[i])
source_str.append('')
inputs_again = {'the_input': x_train,
'the_labels': labels,
'input_length': input_length,
'label_length': label_length,
'source_str': source_str # used for visualization only
}
outputs = {'ctc': np.zeros([size])}
model.fit(inputs_again, outputs, epochs=200,batch_size=25)
My complete code is hosted here: https://github.com/aayushee/HWR/blob/master/Run/CTC.py and these are the screenshots of the model and training: https://github.com/aayushee/HWR/blob/master/Run/model.png https://github.com/aayushee/HWR/blob/master/Run/epochs.png
Please suggest if the model architecture needs to be modified, some other optimizer would be better for this problem or if there is some thing else that can fix the issue. Thanks!