Applying a convnet classifier trained on tiles to a large image

Question

My task is to find a certain letter on a picture of a document. Using classical computer vision I have segmented the image into characters. Then I used a neural network trained on 25×25 pixel images of characters to classify them into the one that I want and all others. Using this I can reconstruct the locations of these characters.

Now I want to apply the convnet directly to the whole image such that I do not have to rely on the classical segmentation. The network is a deep neural network consisting from 2D-convolutions, 2D-max-pooling layers and then a dense classifier. The network looks like this:

Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_61 (Conv2D)           (None, 23, 23, 32)        320       
_________________________________________________________________
max_pooling2d_50 (MaxPooling (None, 11, 11, 32)        0         
_________________________________________________________________
conv2d_62 (Conv2D)           (None, 9, 9, 64)          18496     
_________________________________________________________________
max_pooling2d_51 (MaxPooling (None, 4, 4, 64)          0         
_________________________________________________________________
flatten_46 (Flatten)         (None, 1024)              0         
_________________________________________________________________
dropout_5 (Dropout)          (None, 1024)              0         
_________________________________________________________________
dense_89 (Dense)             (None, 1)                 1025      
=================================================================
Total params: 19,841
Trainable params: 19,841
Non-trainable params: 0

I know that I can apply the convolutional part to the whole image with the trained filters. This would give me the response to these filters in the form of a tensor with larger spatial dimensions. But in order to classify, I need to use the classifier that is trained to a fixed amount of spatial information. Feeding a different size image would break this.

My best idea so far is to slice the image into tiles and feed each tile of fixed size into the classifier. This seems to be the answer to another question.

Does something better exist that applies the trained filters to the whole image and can do some sort of local classification using the trained classifier?

Follow-up: Did you try the suggested solution? Did it work for you, or did you encounter some problems? Some feedback would be really appreciated so that the answer could be improved/corrected for the future readers (of course, there is no pressure for doing this!). Thanks! — today, Sep 21 '20 at 08:08

today · Answer 1 · 2020-09-16T15:38:38.020

As one solution, I suggest you to use tf.image.extract_patches function to extract the patches from the image and apply your trained classifier on each patch. This has a few benefits:

You would get a dense response map which you can further process to accurately determine the position of letter occurrences in the whole image.
Since this is a built-in TensorFlow Op, you can simplify the process by implementing and running all of this as a single Keras model, and therefore take advantage of batch processing as well as speeded-up CPU/GPU processing.

Here is a sketch of the solution:

import tensorflow as tf
from tensorflow.keras.layers import Input, Reshape, TimeDistributed

whole_images = Input(shape=(img_rows, img_cols, 1))
patches = tf.image.extract_patches(
    whole_images,
    sizes=[1, 25, 25, 1],
    strides=[1, 1, 1, 1], # you can choose to increase the stride if you don't want a dense classification map
    rates=[1, 1, 1, 1],
    padding='SAME'
)
# The `patches` would have a shape of `(batch_size, num_row_locs, num_col_locs, 25*25)`.
# So we reshape it so that we can apply the classifier to each patch independently.
reshaped_patches = Reshape((-1, 25, 25, 1))(patches)
dense_map = TimeDistributed(letter_classifier)(reshaped_patches)
# Reshape it back
dense_map = Reshape(tf.shape(patches)[1:-1])(dense_map)

# Construct the model
image_classifier = Model(whole_images, dense_map)

# Use it on the real images
output = image_classifier(my_images)

Applying a convnet classifier trained on tiles to a large image

1 Answers1