Padding time dimension in softmax output for CTC loss

Question

Network:

Input sequence  -> BiLSTM---------> BiLSTM --------> Dense with softmax
Output shapes:    (None, 5, 256)   (None, 5, 128)      (None, 5, 11)

Here is my CTC loss:

def calculate_ctc_loss(y_true, y_pred):
    batch_length = tf.cast(tf.shape(y_true)[0], dtype="int64")
    input_length = tf.cast(tf.shape(y_pred)[1], dtype="int64")
    label_length = tf.cast(tf.shape(y_true)[1], dtype="int64")
    
    input_length = input_length * tf.ones(shape=(batch_length, 1), dtype="int64")
    label_length = label_length * tf.ones(shape=(batch_length, 1), dtype="int64")
    
    loss = tf.keras.backend.ctc_batch_cost(y_true, y_pred, input_length, label_length)
    return loss

There are 10 classes in total. For the first batch with a batch size of 16 the shapes are:

y_true: (16, 7)
y_pred: (16, 5, 11)

I tried to pad the time demesnion in y_pred so that the shape is (16, 7, 11) but the loss turned nan.

Ques: How to correctly pad the time dimension in this case so that y_true and y_pred have compatible shapes for CTC calculation?

The problem with `nan` loss is due to many reasons, but what I commonly see are (1) Too high or too low learning rate, (2) You should add a small number, like `loss = loss + 1e-7` due to floating point issues. Can you try experimenting with these first? — Minh-Long Luu, Mar 04 '23 at 16:50
It isn't related to lr but my lr is already is sensible at this point `1e-4`. The loss turned nan only when In padded the time dimension with zeros — enterML, Mar 04 '23 at 16:52

Padding time dimension in softmax output for CTC loss

0 Answers0