How to use tf.nn.ctc_loss in cnn+ctc network

Question

Recently, I try to use tensorflow to implement a cnn+ctc network base on the article Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks.

I try to feed batch spectrogram data (shape:(10,120,155,3),batch_size is 10) into 10 convolution layer and 3 fully connected layer. So the output before connecting the ctc layer is 2d data(shape:(10,1024)).

Here is my problem: I want to use tf.nn.ctc_loss function in tensorflow library,but it generate the ValueError: Dimension must be 2 but is 3 for 'transpose'(op:'Transpose') with input shapes:[?,1024],[3].

I guess the error is related to the dimension of my 2d input data. The discription of the ctc_loss function in tensorflow official site is require a 3d input with the shape (batch_size x max_time x num_classes).

So, what is the extra dimension of 'num_classes' ? what should I change the shape of my cnn+fc output data?

You can check code at https://github.com/mozilla/DeepSpeech/blob/master/DeepSpeech.py for details how to use ctc.loss. To get help on your particular issue you need to show the actual code you wrote. — Nikolay Shmyrev, Jun 26 '17 at 16:20
Should be ([batch, sequence_length, distribution_over_symbols]). — Konstantinos Monachopoulos, Jun 25 '18 at 14:01

score 2 · Answer 1 · answered Aug 27 '17 at 21:52

The fully connected layer should be applied per time step. It's like applying same dense layer per time step in recurrent neural network. For output of convolution layer, time step is width.

So for example, output shape would be:

convolution: (10,120,155,3) = (batch, height, width, channels)
flatten: (10, 155, 120*3) = (batch, max_time, features)
fully connected: (10, 155, 1024), (same dense layer applied per time step)
(10, 155, num_classes)

It is expected shape for ctc_loss in tensorflow.

Actually this is wrong. Question was about `tf.nn.ctc_loss`. By default `tf.nn.ctc_loss` accepts shape `[frames, batch_size, num_labels]`. — Alexander Zot, Jul 23 '21 at 11:44

How to use tf.nn.ctc_loss in cnn+ctc network

1 Answers1

Linked