My task is to find a certain letter on a picture of a document. Using classical computer vision I have segmented the image into characters. Then I used a neural network trained on 25×25 pixel images of characters to classify them into the one that I want and all others. Using this I can reconstruct the locations of these characters.
Now I want to apply the convnet directly to the whole image such that I do not have to rely on the classical segmentation. The network is a deep neural network consisting from 2D-convolutions, 2D-max-pooling layers and then a dense classifier. The network looks like this:
Layer (type) Output Shape Param #
=================================================================
conv2d_61 (Conv2D) (None, 23, 23, 32) 320
_________________________________________________________________
max_pooling2d_50 (MaxPooling (None, 11, 11, 32) 0
_________________________________________________________________
conv2d_62 (Conv2D) (None, 9, 9, 64) 18496
_________________________________________________________________
max_pooling2d_51 (MaxPooling (None, 4, 4, 64) 0
_________________________________________________________________
flatten_46 (Flatten) (None, 1024) 0
_________________________________________________________________
dropout_5 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_89 (Dense) (None, 1) 1025
=================================================================
Total params: 19,841
Trainable params: 19,841
Non-trainable params: 0
I know that I can apply the convolutional part to the whole image with the trained filters. This would give me the response to these filters in the form of a tensor with larger spatial dimensions. But in order to classify, I need to use the classifier that is trained to a fixed amount of spatial information. Feeding a different size image would break this.
My best idea so far is to slice the image into tiles and feed each tile of fixed size into the classifier. This seems to be the answer to another question.
Does something better exist that applies the trained filters to the whole image and can do some sort of local classification using the trained classifier?