6

I have been implementing an Android OCR tool using tesseract to ocr digits only. So far, it is giving quite high accuracy with normal digit fonts. However, the accuracy is terrible when it comes to 7 segment digits (those found on LCDs) .

I have tried cropping my image, whitelist with 0 to 9 and also some image processing to no avail. Any ideas out there on how to increase the accuracy ? Or perhaps some tips on training the specific 7 segment digits for tesseract will definitely help me a lot.

Thanks in advance.

laurie7
  • 61
  • 1
  • 2
  • I don't think you can get good results without retraining. It would be nice if there were a publicly available traineddata file for 7-segment digits, but I wasn't able to find one when I looked. – rmtheis Nov 29 '12 at 19:19
  • Thank you for the reply. Your blog really helped me a lot in my implementation. So, lots of thanks to you. I am planning to train it and am looking into bbtesseract for the boxing process. I will highly appreciate it if anyone can share some tips for the training process because the official one is kinda confusing to me. – laurie7 Nov 30 '12 at 04:44
  • You can use [jTessBoxEditor](http://vietocr.sourceforge.net/training.html) to edit or generate TIFF/box files to be used in training. There's also a PowerShell script `train.ps1` that helps automate the rest of the training. – nguyenq Dec 01 '12 at 15:24
  • @laurie7: did u find good example to train the tesseract – Terril Thomas Dec 13 '12 at 21:13
  • tesseract img.png out -psm 7 digits does this command helps ? – yunas Jul 26 '13 at 07:23
  • if you could do some pre-classification before recognition, this will help tesseract to enhance the confidence factor. for example (3, 8, and 9) belong to the same category, and so (2, 7) depending on the considered fonts .... you can use also [tesseract-box-editor](http://code.google.com/p/tesseract-box-editor/) to edit box files and to recalibrate the segmented blobs – Y.AL Oct 17 '13 at 13:28
  • It is interesting to have a look at this project: http://www.unix-ag.uni-kl.de/~auerswal/ssocr/ – Y.AL Sep 12 '14 at 14:13

1 Answers1

2

You can find traineddata for 7 segments at:

https://github.com/arturaugusto/display_ocr/tree/master/letsgodigital

There is also a sample python code at the same repository.

art
  • 181
  • 1
  • 9
  • Could you tell me how do you trained tesseractor only for digits? – malaguna Jul 21 '16 at 07:09
  • I have generated some images using a font called "lets go digital", added some noise using gimp, used [jTessBoxEditor](http://vietocr.sourceforge.net/training.html) to generate box data and used [this](https://github.com/this-is-ari/python-tesseract-3.02-training) tool for training. Read [tesseractocr FAQ](https://github.com/tesseract-ocr/tesseract/wiki/FAQ) for more details. I have also shared the [training sources](https://github.com/arturaugusto/display_ocr/tree/master/training_source) – art Jul 21 '16 at 23:05