Environment: tensorflow 2.2 on Windows 10 x64 in CPU only mode. Using tf.keras.
I want to build a simple model for image to text recognition (sometimes called OCR).
For this I use CRNN model architecture with CTC loss function.
There is a nice tf.nn.ctc_loss function which is suited for my purpose. Note: I can not use tf.keras.backend.ctc_batch_cost as it does not have blank_index
argument. But actually it is just calling tf.nn.ctc_loss
inside.
The problem is that it takes at least 4 arguments to work:
- batch of predicted logits
- logits lengths
- batch of true sequences
- true sequences lengths
My dataset returns: (images, image_widths, true_sequences, true_sequence_lengths)
.
I would like to use these values along with predicted logits to calculate the loss value.
But tensorflow is hardly customizable. So at first I tried to create my own class which subclassed tf.keras.losses.Loss
. Then I had to create also a subclass of tf.keras.Model
with overridden train_step
and test_step
methods, because tensorflow refused to pass all required inputs unchanged.
And it worked. Until I saved my model and tried to use it outside, in another language (Java). It failed to load my saved model because it didn't know about my classes.
So I had to return to a functional way of creating a model. Now my plan is to compute the loss as an output of a model during training. And when exporting model graph save only output probabilities layer computation.
Here is how I'm creating a model and do CTC loss computation:
def KerasLighterConvGRU(n_classes: int, blank_class: int, height=32, in_channels=3, rnn_hidden_size=64, training=True):
# Input
images = layers.Input(shape=(height, None, in_channels), dtype='float32', name='images')
# (B, 32, W, 3)
model = tf.keras.Sequential([
# Convolution layer (VGG)
layers.Conv2D(64, (5, 5), padding='same', name='conv1', kernel_initializer='he_normal'),
layers.BatchNormalization(),
layers.LeakyReLU(),
layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='max1'),
# (B, 16, W/2, 64)
layers.Conv2D(128, (3, 3), padding='same', name='conv2', kernel_initializer='he_normal'),
layers.BatchNormalization(),
layers.LeakyReLU(),
layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='max2'),
# (B, 8, W/4, 128)
layers.Conv2D(128, (3, 3), padding='same', name='conv3', kernel_initializer='he_normal'),
layers.BatchNormalization(),
layers.LeakyReLU(),
layers.MaxPooling2D(pool_size=(2, 1), strides=(2, 1), name='max3'),
# (B, 4, W/4, 128)
layers.DepthwiseConv2D((3, 3), padding='same', depth_multiplier=2, kernel_initializer='he_normal', name='dconv4'),
layers.BatchNormalization(),
layers.LeakyReLU(),
layers.MaxPooling2D(pool_size=(2, 1), strides=(2, 1), name='max4'),
# (B, 2, W/4, 256)
layers.Conv2D(128, (1, 1), padding='same', kernel_initializer='he_normal', name='conv5'),
layers.BatchNormalization(),
layers.LeakyReLU(),
# (B, 2, W/4, 128)
layers.MaxPooling2D(pool_size=(2, 1), strides=(2, 1), name='max5'),
# (B, 1, W/4, 128)
# CNN to RNN
layers.Lambda(lambda x: K.squeeze(x, axis=1), name='cnn_to_rnn'),
# (B, W/4, 128)
# RNN layer
layers.Bidirectional(layers.GRU(rnn_hidden_size, return_sequences=True, kernel_initializer='he_normal'),
merge_mode='sum', name='rnn1'),
layers.Bidirectional(layers.GRU(rnn_hidden_size, return_sequences=True, kernel_initializer='he_normal'),
merge_mode='concat', name='rnn2'),
# (B, W/4, 2*rnn_hidden_size)
layers.Dense(n_classes, kernel_initializer='he_normal', name='dense2'),
layers.Activation('softmax', name='output'),
])
y_probs = model(images)
if training:
targets = layers.Input(name='targets', shape=(None,), dtype='int32') # (B, ?)
target_lengths = layers.Input(name='target_lengths', shape=[], dtype='int64') # (B)
blank_class = K.constant(blank_class, dtype='int32', shape=None, name=None)
loss = layers.Lambda(lambda x: ctc_loss(x), name='ctc_loss')((y_probs, targets, target_lengths, blank_class))
return tf.keras.Model(inputs=[images, targets, target_lengths], outputs=loss)
else:
return tf.keras.Model(inputs=[images], outputs=y_probs)
def ctc_loss(args):
y_probs, targets, target_lenghts, blank_index = args
# Compute log(probabilities)
logits = K.log(y_probs)
# Make a fake logit lengths vector as the maximum length of a predicted sequence
logit_lengths = K.ones((K.shape(logits)[0],), dtype='int32') * K.shape(logits)[1]
# Return a batch of CTC loss values
return tf.nn.ctc_loss(
labels=targets,
logits=logits,
label_length=target_lenghts,
logit_length=logit_lengths,
blank_index=blank_index,
logits_time_major=False
)
The model is initialized in this way:
model = KerasLighterConvGRU(n_classes=train_dataset.n_classes, height=height, blank_class=blank_class, training=True)
# The loss calculation occurs elsewhere, so use a dummy lambda func for the loss
model.compile(
optimizer=Adam(learning_rate=learning_rate),
loss={'ctc_loss': lambda y_true, y_pred: y_pred}
)
But model.compile(...)
results in a error:
RuntimeError: Attempting to capture an EagerTensor without building a function.
It does not like this particular line of code:
loss = layers.Lambda(lambda x: ctc_loss(x), name='ctc_loss')((y_probs, targets, target_lengths, blank_class))
I already tried answers from: RuntimeError: Attempting to capture an EagerTensor without building a function - they didn't help.
Please help me to understand, why is it not working, and how to fix it?