ctc_loss error "No valid path found."

Question

Training a model with tf.nn.ctc_loss produces an error every time the train op is run:

tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.

Unlike in previous questions about this function, this is not due to divergence. I have a low learning rate, and the error occurs on even the first train op.

The model is a CNN -> LSTM -> CTC. Here is the model creation code:

# Build Graph
self.videoInput = tf.placeholder(shape=(None, self.maxVidLen, 50, 100, 3), dtype=tf.float32)
self.videoLengths = tf.placeholder(shape=(None), dtype=tf.int32)
self.keep_prob = tf.placeholder(dtype=tf.float32)
self.targets = tf.sparse_placeholder(tf.int32)
self.targetLengths = tf.placeholder(shape=(None), dtype=tf.int32)

conv1 = tf.layers.conv3d(self.videoInput ...)
pool1 = tf.layers.max_pooling3d(conv1 ...)
conv2 = ...
pool2 = ...
conv3 = ...
pool3 = ...

cnn_out = tf.reshape(pool3, shape=(-1, self.maxVidLength, 4*7*96))

fw_cell = tf.nn.rnn_cell.MultiRNNCell(self.cell(), for _ in range(3))
bw_cell = tf.nn.rnn_cell.MultiRNNCell(self.cell(), for _ in range(3))
outputs, _ = tf.nn.bidirectional_dynamic_rnn(
            fw_cell, bw_cell, cnn_out, sequence_length=self.videoLengths, dtype=tf.float32)

outputs = tf.concat(outputs, 2)
outputs = tf.reshape(outputs, [-1, self.hidden_size * 2])

w = tf.Variable(tf.random_normal((self.hidden_size * 2, len(self.char2index) + 1), stddev=0.2))
b = tf.Variable(tf.zeros(len(self.char2index) + 1))

out = tf.matmul(outputs, w) + b
out = tf.reshape(out, [-1, self.maxVidLen, len(self.char2index) + 1])
out = tf.transpose(out, [1, 0, 2])

cost = tf.reduce_mean(tf.nn.ctc_loss(self.targets, out, self.targetLengths))
self.train_op = tf.train.AdamOptimizer(0.0001).minimize(cost)

And here is the feed dict creation code:

indices = []
values = []
shape = [len(vids) * 2, self.maxLabelLen]
vidInput = np.zeros((len(vids) * 2, self.maxVidLen, 50, 100, 3), dtype=np.float32)

# Actual video, then left-right flip
for j in range(len(vids) * 2):

    # K is video index
    k = j if j < len(vids) else j - len(vids)

    # convert video and label to input format
    vidInput[j, 0:len(vids[k])] = vids[k] if k == j else vids[k][:,::-1,:]
    indices.extend([j, i] for i in range(len(labelList[k])))
    values.extend(self.char2index[c] for c in labelList[k])

fd[self.targets] = (indices, values, shape)
fd[self.videoInput] = vidInput

# Collect video lengths and label lengths
vidLengths = [len(j) for j in vids] + [len(j) for j in vids]
labelLens = [len(l) for l in labelList] + [len(l) for l in labelList]
fd[self.videoLengths] = vidLengths
fd[self.targetLengths] = labelLens

score 13 · Accepted Answer · edited Nov 27 '18 at 14:23

13

It turns out that the ctc_loss requires that the label lengths be shorter than the input lengths. If the label lengths are too long, the loss calculator cannot unroll completely and therefore cannot compute the loss.

For example, the label BIFI would require input length of at least 4 while the label BIIF would require input length of at least 5 due to a blank being inserted between the repeated symbols.

edited Nov 27 '18 at 14:23

user3733083

419
4
6

answered Jul 23 '17 at 14:35

pwp2

451
3
12

1

any advice as to how you can debug this ? It seems those would be wrong examples, but it's difficult to identify them when training in batches – Ciprian Tomoiagă Jan 12 '18 at 09:12
@CiprianTomoiagă: that's pretty simple - you know the length (T) of the RNN output sequence. And you know your ground truth texts for which you calculate the length (L) and the number of repeated characters (R). Now you only have to check if L+R<=T. If not, CTC can't compute a loss value for this text and will throw the mentioned warning. Example: T=2, txt1="ab", txt2="aa". L1=2, R1=0 -> L1+R1<=T -> ok. L2=2, R2=1 -> L2+R2>T -> not ok. – Harry Sep 11 '18 at 13:44

score 4 · Answer 2 · edited May 31 '19 at 03:33

4

I had the same issue but I soon realized it was just because I was using glob and my label was in the filename so it was exceeding.

You can fix this issue by using:

os.path.join(*(filename.split(os.path.sep)[noOfDir:]))

edited May 31 '19 at 03:33

Andrew Fan

1,313
5
17
29

answered May 30 '19 at 22:31

PRAVENDRA S KHINCHI B16EE026

121
6

score 2 · Answer 3 · answered Aug 17 '18 at 16:25

2

For me the problem was fixed by setting preprocess_collapse_repeated=True.
FWIW: My target sequence length was already shorter than inputs, and the RNN outputs are that of softmax.

answered Aug 17 '18 at 16:25

Zining Zhu

313
1
2
12

score 1 · Answer 4 · answered Aug 15 '18 at 05:17

1

Another possible reason which I found out in my case is the input data range is not normalized to 0~1, due to that LSTM activation function becomes saturated in the beginning of the training, and causes "no valid path" log somehow.

answered Aug 15 '18 at 05:17

TingQian LI

660
8
13

ctc_loss error "No valid path found."

4 Answers4

Linked