0

I am trying to OCR following image using tessnet.

To OCR

Unfortunately it doesn't work. I've tried to Re-scale image or convert this image to bitonal image, but it didn't help.

I am using english language pack with following whitelist _ocr.SetVariable("tessedit_char_whitelist", "0123456789.,abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ");

and OCR code:

 var resultScaled = _ocr.DoOCR(bmpScaled, Rectangle.Empty);

Can you please advise what operations I can perform in order to improve quality of image for tessnet ?

Jack
  • 350
  • 2
  • 15
  • 1
    I personally had more success with the Tesseract wrapper provided by CharlesW. I'd give that a try, it works almost the same as Tessnet2 and it's updated regularly aswell. You can find the wrapper here https://github.com/charlesw/tesseract. – Michel de Nijs Apr 16 '15 at 13:29
  • 2
    Along with chaning the image to bitonal and upscaling you might want to inverse it as well, so that the text is black and the background is white. – juharr Apr 16 '15 at 13:31
  • thanks guys. I found both answers very helpful. Inversing colors helped a lot as well as using tesseract wrapper recommended by Michel. Now I am struggling with accuracy, basically at this stage I only need to recognize Currency Pairs . There are around 300 pairs, so I am thinking of creating dictionary of pairs. I've read about "bazaar", but it doesn;t work for me. I followed instructions from: https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html and I keep getting: "Could not open file, ./tessdata/eng.user-words." – Jack Apr 17 '15 at 15:30

0 Answers0