can we find the required string in image using CNN/LSTM? or do we need to apply NLP after extracting text using CNN/LSTM. can someone please clarify?

Question

Im building a parser algorithm on images. tesseract not giving accuracy. so im thinking to build a CNN+LSTM based model for image to text conversion. is my approach is the right one? can we extract only the required string directly from CNN_LSTM model instead of NLP? or you see any other ways to improve tesseract accuracy?

score 0 · Answer 1 · answered Mar 23 '19 at 05:02

NLP is used to allow the network to try and "understand" text. I think what you want here is to see if a picture contains text. For this, NLP would not be required, since you are not trying to get the network to analyze or understand the text. Instead, this should be more of an object detection type problem.

There are many models that do object detection. Some off the top of my head are YOLO, R-CNN, and Mask R-CNN.

can we find the required string in image using CNN/LSTM? or do we need to apply NLP after extracting text using CNN/LSTM. can someone please clarify?

1 Answers1