1

Environment: tensorflow 2.2 on Windows 10 x64 in CPU only mode. Using tf.keras.

I want to build a simple model for image to text recognition (sometimes called OCR).

For this I use CRNN model architecture with CTC loss function.

There is a nice tf.nn.ctc_loss function which is suited for my purpose. Note: I can not use tf.keras.backend.ctc_batch_cost as it does not have blank_index argument. But actually it is just calling tf.nn.ctc_loss inside.

The problem is that it takes at least 4 arguments to work:

  • batch of predicted logits
  • logits lengths
  • batch of true sequences
  • true sequences lengths

My dataset returns: (images, image_widths, true_sequences, true_sequence_lengths). I would like to use these values along with predicted logits to calculate the loss value.

But tensorflow is hardly customizable. So at first I tried to create my own class which subclassed tf.keras.losses.Loss. Then I had to create also a subclass of tf.keras.Model with overridden train_step and test_step methods, because tensorflow refused to pass all required inputs unchanged.

And it worked. Until I saved my model and tried to use it outside, in another language (Java). It failed to load my saved model because it didn't know about my classes.

So I had to return to a functional way of creating a model. Now my plan is to compute the loss as an output of a model during training. And when exporting model graph save only output probabilities layer computation.

Here is how I'm creating a model and do CTC loss computation:

def KerasLighterConvGRU(n_classes: int, blank_class: int, height=32, in_channels=3, rnn_hidden_size=64, training=True):

    # Input
    images = layers.Input(shape=(height, None, in_channels), dtype='float32', name='images')
    # (B, 32, W, 3)

    model = tf.keras.Sequential([
        # Convolution layer (VGG)
        layers.Conv2D(64, (5, 5), padding='same', name='conv1', kernel_initializer='he_normal'),
        layers.BatchNormalization(),
        layers.LeakyReLU(),
        layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='max1'),
        # (B, 16, W/2, 64)
        
        layers.Conv2D(128, (3, 3), padding='same', name='conv2', kernel_initializer='he_normal'),
        layers.BatchNormalization(),
        layers.LeakyReLU(),
        layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='max2'),
        # (B, 8, W/4, 128)
        
        layers.Conv2D(128, (3, 3), padding='same', name='conv3', kernel_initializer='he_normal'),
        layers.BatchNormalization(),
        layers.LeakyReLU(),
        layers.MaxPooling2D(pool_size=(2, 1), strides=(2, 1), name='max3'),
        # (B, 4, W/4, 128)
        
        layers.DepthwiseConv2D((3, 3), padding='same', depth_multiplier=2, kernel_initializer='he_normal', name='dconv4'),
        layers.BatchNormalization(),
        layers.LeakyReLU(),
        layers.MaxPooling2D(pool_size=(2, 1), strides=(2, 1), name='max4'),
        # (B, 2, W/4, 256)
        
        layers.Conv2D(128, (1, 1), padding='same', kernel_initializer='he_normal', name='conv5'),
        layers.BatchNormalization(),
        layers.LeakyReLU(),
        # (B, 2, W/4, 128)
        
        layers.MaxPooling2D(pool_size=(2, 1), strides=(2, 1), name='max5'),
        # (B, 1, W/4, 128)
        
        # CNN to RNN
        layers.Lambda(lambda x: K.squeeze(x, axis=1), name='cnn_to_rnn'),
        # (B, W/4, 128)
        
        # RNN layer
        layers.Bidirectional(layers.GRU(rnn_hidden_size, return_sequences=True, kernel_initializer='he_normal'),
            merge_mode='sum', name='rnn1'),
        
        layers.Bidirectional(layers.GRU(rnn_hidden_size, return_sequences=True, kernel_initializer='he_normal'),
            merge_mode='concat', name='rnn2'),
        # (B, W/4, 2*rnn_hidden_size)
        
        layers.Dense(n_classes, kernel_initializer='he_normal', name='dense2'),
        layers.Activation('softmax', name='output'),
    ])
    
    y_probs = model(images)
    
    if training:
        targets = layers.Input(name='targets', shape=(None,), dtype='int32')  # (B, ?)
        target_lengths = layers.Input(name='target_lengths', shape=[], dtype='int64')  # (B)
        blank_class = K.constant(blank_class, dtype='int32', shape=None, name=None)
        loss = layers.Lambda(lambda x: ctc_loss(x), name='ctc_loss')((y_probs, targets, target_lengths, blank_class))
        return tf.keras.Model(inputs=[images, targets, target_lengths], outputs=loss)

    else:
        return tf.keras.Model(inputs=[images], outputs=y_probs)


def ctc_loss(args):
    y_probs, targets, target_lenghts, blank_index = args
    # Compute log(probabilities)
    logits = K.log(y_probs)
    # Make a fake logit lengths vector as the maximum length of a predicted sequence
    logit_lengths = K.ones((K.shape(logits)[0],), dtype='int32') * K.shape(logits)[1]
    # Return a batch of CTC loss values
    return tf.nn.ctc_loss(
        labels=targets,
        logits=logits,
        label_length=target_lenghts,
        logit_length=logit_lengths,
        blank_index=blank_index,
        logits_time_major=False
    )

The model is initialized in this way:

    model = KerasLighterConvGRU(n_classes=train_dataset.n_classes, height=height, blank_class=blank_class, training=True)
    
    # The loss calculation occurs elsewhere, so use a dummy lambda func for the loss
    model.compile(
        optimizer=Adam(learning_rate=learning_rate),
        loss={'ctc_loss': lambda y_true, y_pred: y_pred}
    )

But model.compile(...) results in a error: RuntimeError: Attempting to capture an EagerTensor without building a function.

It does not like this particular line of code:

loss = layers.Lambda(lambda x: ctc_loss(x), name='ctc_loss')((y_probs, targets, target_lengths, blank_class))

I already tried answers from: RuntimeError: Attempting to capture an EagerTensor without building a function - they didn't help.

Please help me to understand, why is it not working, and how to fix it?

Pavel Chernov
  • 1,807
  • 1
  • 16
  • 15

0 Answers0