Do convolutional neural networks run faster on binary images

Question

I am trying some DCNN to recognize handwriting words (word spotting) where the images are binary, and I am wondering if the computation time will be faster than using DCNNs with other gray-level or color images.

In addition, how one can equalize the image sizes, as normalizing the images of words will produce words with different scales.
Any suggestions?

score 1 · Answer 1 · answered May 17 '18 at 14:03

The computation time for gray-scale images is certainly faster, but not due to zeros, it's simply the input tensor size. Color images are [batch, width, height, 3], while gray-scale images are [batch, width, height, 1]. The difference in depth, as well as in spatial size, affects the time spent on the first convolutional layer, which is usually one of the most time-consuming. That's why consider resizing the images as well.

You may also want to read about 1x1 convolution trick to speed up computation. Usually it's applied in the middle of the network when the number of filters becomes significantly large.

As for the second question (if I get it right), ultimately you have to resize the images. If the images contain the texts of different font sizes, one possible strategy is to resize + pad or crop + resize. You have to know the font size on each particular image to select the right padding or crop size. This method needs (possibly) fair amount of manual work.

A completely different way would to ignore these differences and let the network learn OCR, despite the font size discrepancy. It is a viable solution, doesn't require a lot of manual pre-processing, but simply needs more training data to avoid overfitting. If you examine MNIST dataset, you notice the digits are not always the same size, yet CNNs achieve 99.5% accuracy pretty easily.

I forgot to say that I am doing word spotting. If we have two words written by the same person, the first is "Imagination" and the other is "Bar". They might have the same height, but the width of "Imagination" will be more than the width of "Bar". Scaling both images to 64x64 will yield smaller fonts in the image of "Imagination" than "Bar". I was thinking to pad the image of "Bar" to match the size of "Imagination", then scale both. But, this will leave the image "Bar" with some emptiness (a lot of zeros). Data augmentation will increase the data size. — innuendo, May 17 '18 at 14:32

Do convolutional neural networks run faster on binary images

1 Answers1