I have a model, made up of a CNN, RNN, and Output layer. The data I have is a image and the transcription of it. The transcription is padded to a length of 9 characters. For the CTC loss I followed the keras ocr example code so its like this:
class CTCLayer(layers.Layer):
def __init__(self, name=None):
super().__init__(name=name)
self.loss_fn = keras.backend.ctc_batch_cost
def call(self, y_true, y_pred):
batch_len = tf.cast(tf.shape(y_true)[0], dtype="int64")
input_length = tf.cast(tf.shape(y_pred)[1], dtype="int64")
label_length = tf.cast(tf.shape(y_true)[1], dtype="int64")
input_length = input_length * tf.ones(shape=(batch_len, 1), dtype="int64")
label_length = label_length * tf.ones(shape=(batch_len, 1), dtype="int64")
loss = self.loss_fn(y_true, y_pred, input_length, label_length)
self.add_loss(loss)
return y_pred
Now here is how I immplimented it:
#l is the number possible of classes / characters
labels = layers.Input(shape=(9,), dtype="float32")
outputs = layers.Dense(l+1, activation='softmax',name='output')(lstm)
output = CTCLayer()(labels,outputs)
model = Model(inputs = [input_layer,labels],outputs=output)
model = model.compile(optimizer = optimizers.Adam(0.01))
model.fit([x_train,y_train],y_train,validation_split = 0.2, epochs = 100)
Once running model.fit something weird started happening, I got a inf training loss but a validation loss which was at around 20. I looked at what may be causing it and came across this post. The accepted answer stated the following:
It's definitely the sequence length of the input that causes the problem. Apparently, the sequence length should be a bit greater than the ground truth length.
What does this mean and how would I need to change my code in order to solve the issue I am having?