2

I am using Keras with theano backend for online handwriting recognition problem as solved in this paper: http://papers.nips.cc/paper/3213-unconstrained-on-line-handwriting-recognition-with-recurrent-neural-networks.pdf.

I followed the Keras image ocr example https://github.com/keras-team/keras/blob/master/examples/image_ocr.py and modified the code for online handwriting samples instead of image samples. On training with a dataset of 842 text lines for 200 epochs that takes ~6 minutes per epoch, the CTC logloss decreases just after the first epoch but stays constant for all remaining epochs. I have tried with different optimizers (sgd, adam, adadelta) and learning rates(0.01,0.1,0.2) as well but there is hardly any variation in loss.

x_train.shape=(842,1263,4) [842 text lines with 1263 stroke points in 4 dimension]

y_train.shape=(842,64) [842 text lines with 64 max_len characters per line]

types of labels (len_alphabet)= 66

Code snapshot:

size=x_train.shape[0]
trainable=True
inputs = Input(name='the_input', shape=x_train.shape[1:], dtype='float32')
rnn_encoded = Bidirectional(GRU(64, return_sequences=True),
                            name='bidirectional_1',
                            merge_mode='concat',trainable=trainable)(inputs)
birnn_encoded = Bidirectional(GRU(64, return_sequences=True),
                            name='bidirectional_2',
                            merge_mode='concat',trainable=trainable)(rnn_encoded)

output = TimeDistributed(Dense(66, activation='softmax'))(birnn_encoded)
y_pred = Activation('softmax', name='softmax')(output)
labels = Input(name='the_labels', shape=[max_len], dtype='int32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')
loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred,labels, input_length, label_length])
model = Model(inputs=[inputs, labels, input_length, label_length], outputs=loss_out)
model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer='Adadelta')
absolute_max_string_len=max_len
blank_label=len(alphabet)+1
labels = np.ones([size, absolute_max_string_len])
input_length = np.zeros([size, 1])
label_length = np.zeros([size, 1])
source_str = []
for i in range (x_train.shape[0]):
    labels[i, :] = y_train[i]
    input_length[i] = x_train.shape[1]
    label_length[i] =len(y_train[i])
    source_str.append('')
inputs_again = {'the_input': x_train,
              'the_labels': labels,
              'input_length': input_length,
              'label_length': label_length,
              'source_str': source_str  # used for visualization only
              }
outputs = {'ctc': np.zeros([size])} 
model.fit(inputs_again, outputs, epochs=200,batch_size=25)

My complete code is hosted here: https://github.com/aayushee/HWR/blob/master/Run/CTC.py and these are the screenshots of the model and training: https://github.com/aayushee/HWR/blob/master/Run/model.png https://github.com/aayushee/HWR/blob/master/Run/epochs.png

Please suggest if the model architecture needs to be modified, some other optimizer would be better for this problem or if there is some thing else that can fix the issue. Thanks!

Aayushee
  • 31
  • 8
  • 1
    I was able to make it work by changing the network architecture: 2 layers with 128 and 64 neurons each. The loss decreases and somehow goes to negative value after falling rapidly in just 10 epochs! – Aayushee Feb 14 '18 at 10:33
  • I have actually a similar problem, my model doesn't goes to a low loss. It finds someting like a local minima (converges to all blank symbols)... Any idea how to solve that? – Stefan Feb 12 '20 at 19:49
  • There could be different solutions depending on your model and data: Data normalization, Increasing the learning rate, hidden layers/units, try different activation functions and optimization algorithms. – Aayushee Feb 13 '20 at 05:26

0 Answers0