0

I have a Neural network (NN) in deeplearning4j trained with MNIST to recognise digits on an image. As MNIST set contains 28x28 pixel images, I am able to predict class of an 28x28 image using this NN.

I am trying to find out how to apply this NN on a picture of a handwritten page? How to convert text from that image to actual text (OCR)? Basically, What kind of preprocessing is needed and how to find out part of image where the text is? How to derive smaller pieces of that image to apply NN individually?

Patriks
  • 1,012
  • 1
  • 9
  • 29

1 Answers1

0

You may want to explore HTR using Tensorflow (Handwritten text recognition). There are some interesting implementations already available and used widely as baseline models for the same. One such can be found here.

enter image description here

The architecture above details out how to design such as system. You can modify this further to suit your requirements of course.

If you are working with a combination of data, or trying to understand preprocessing steps for such images, here is a link that could guide you.

enter image description here

The primary preprocessing step is to detect and crop words so that those are manageable by the underlying TensorFlow HTR or tesseract architecture.

You may want to look at cropyble, which packages the cropping and the word extraction in one go. You can use this specifically for just cropping the image to extract word sequences for other downstream tasks

Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51