What's the difference between tf.nn.ctc_loss with pytorch.nn.CTCLoss

Question

For the same input and label:

the output of pytorch.nn.CTCLoss is 5.74,
the output of tf.nn.ctc_loss is 129.69,
but the output of math.log(tf ctc loss) is 4.86

So what's the difference between pytorch.nn.CTCLoss with tf.nn.ctc_loss?

tf: 1.13.1
pytorch: 1.1.0

I had try to these:

log_softmax the input, and then send it to pytorch.nn.CTCLoss,
tf.nn.log_softmax the input, and then send it to tf.nn.ctc_loss
directly send the input to tf.nn.ctc_loss
directly send the input to tf.nn.ctc_loss, and then math.log(output of tf.nn.ctc_loss)

In the case 2, case 3, and case 4, the result of calculation is difference from pytorch.nn.CTCLoss

from torch import nn
import torch
import tensorflow as tf
import math

time_step = 50  # Input sequence length
vocab_size = 20  # Number of classes
batch_size = 16  # Batch size
target_sequence_length = 30  # Target sequence length


def dense_to_sparse(dense_tensor, sequence_length):
    indices = tf.where(tf.sequence_mask(sequence_length))
    values = tf.gather_nd(dense_tensor, indices)
    shape = tf.shape(dense_tensor, out_type=tf.int64)
    return tf.SparseTensor(indices, values, shape)


def compute_loss(x, y, x_len):
    ctclosses = tf.nn.ctc_loss(
        y,
        tf.cast(x, dtype=tf.float32),
        x_len,
        preprocess_collapse_repeated=False,
        ctc_merge_repeated=False,
        ignore_longer_outputs_than_inputs=False
    )
    ctclosses = tf.reduce_mean(ctclosses)

    with tf.Session() as sess:
        ctclosses = sess.run(ctclosses)
        print(f"tf ctc loss: {ctclosses}")
        print(f"tf log(ctc loss): {math.log(ctclosses)}")


minimum_target_length = 10

ctc_loss = nn.CTCLoss(blank=vocab_size - 1)
x = torch.randn(time_step, batch_size, vocab_size)  # [size] = T,N,C
y = torch.randint(0, vocab_size - 2, (batch_size, target_sequence_length), dtype=torch.long)  # low, high, [size]

x_lengths = torch.full((batch_size,), time_step, dtype=torch.long)  # Length of inputs
y_lengths = torch.randint(minimum_target_length, target_sequence_length, (batch_size,),
                          dtype=torch.long)  # Length of targets can be variable (even if target sequences are constant length)

loss = ctc_loss(x.log_softmax(2).detach(), y, x_lengths, y_lengths)
print(f"torch ctc loss: {loss}")

x = x.numpy()
y = y.numpy()
x_lengths = x_lengths.numpy()
y_lengths = y_lengths.numpy()
x = tf.cast(x, dtype=tf.float32)
y = tf.cast(dense_to_sparse(y, y_lengths), dtype=tf.int32)
compute_loss(x, y, x_lengths)

I expect the output of tf.nn.ctc_loss is same with the output of pytorch.nn.CTCLoss, but actually they are not, but how can i make them same?

score 4 · Answer 1 · answered Mar 04 '20 at 17:46

The automatic mean reduction of the CTCLoss of pytorch is not the same as computing all the individual losses, and then doing the mean (as you are doing in the Tensorflow implementation). Indeed from the doc of CTCLoss (pytorch):

``'mean'``: the output losses will be divided by the target lengths and
            then the mean over the batch is taken.

To obtain the same value:

1- Change the reduction method to sum:

ctc_loss = nn.CTCLoss(reduction='sum')

2- Divide the loss computed by the batch_size:

loss = ctc_loss(x.log_softmax(2).detach(), y, x_lengths, y_lengths)
loss = (loss.item())/batch_size

3- Change the parameter ctc_merge_repeated of Tensorflow to True (I am assuming it is the case in the pytorch CTC as well)

    ctclosses = tf.nn.ctc_loss(
    y,
    tf.cast(x, dtype=tf.float32),
    x_len,
    preprocess_collapse_repeated=False,
    ctc_merge_repeated=True,
    ignore_longer_outputs_than_inputs=False
)

You will now get very close results between the pytorch loss and the tensorflow loss (without taking the log of the value). The small difference remaining probably comes from slight differences in between the implementations. In my last three runs, I got the following values:

pytorch loss : 113.33 vs tf loss = 113.52
pytorch loss : 116.30 vs tf loss = 115.57
pytorch loss : 115.67 vs tf loss = 114.54

What's the difference between tf.nn.ctc_loss with pytorch.nn.CTCLoss

1 Answers1