3

So far, i have built an ocr app, using tess-two. In order to build the app, I downloaded and built the tess-two library (thanks rmtheis!).

I need to improve OCR output, because the results have less than 20% accuracy levels. I am working with only numbers (0 to 9) and I hope to achieve 100% accuracy.

I have downloaded ghostscript, vietocr and serak as recommended by some blogs which i went through. They cover the training process with more detail than most other entries on the subject. (links here: Pradeep's Blog, reachsri site)

My question is; do I have to download tesseact app again?

some steps in the training seem to imply that I will be executing commands begining with "tesseract.exe", and I dont have any such file on my computer.

Do I still need to download the tesseract app? Or can I work with tess-two?

Any and all help will be appreciated.

Tom Fuller
  • 5,291
  • 7
  • 33
  • 42
GeorgeF
  • 85
  • 1
  • 5

1 Answers1

1

You can train Tesseract on Windows or Linux and use the generated .traineddata file with tess-two. Make sure your tool includes Tesseract training executable.

nguyenq
  • 8,212
  • 1
  • 16
  • 16
  • 1
    Dear Nguyenq, thanks for your reply. I want to be sure that i understand correctly. I should download the tesseract app and install on my windows 7 machine? is the "tesseract training executable" a separate package? – GeorgeF Feb 28 '16 at 01:53
  • 1
    Yes, and you'll have to build the training executable from the source. Alternatively, they come bundled in some training tools, such as [jTessBoxEditor](http://vietocr.sourceforge.net/training.html) or [others](https://github.com/tesseract-ocr/tesseract/wiki/AddOns). – nguyenq Feb 28 '16 at 02:48
  • Thanks agan, sorry for the late reply. I'm comparatively new to this, so please help me understand; is there any reason or advantage to building tesseract from source? Why is there no standing "tesseract.exe" for windows? I've done some searching, and was surprised to find that there was no find(for "tesseract.exe")! why is that? Kindly help with a link to a standing exe file if you have one, or if you believe it will be better to build from source, pls kindly give me a step by step. Honestly, though, i would prefer the standing exe. Thanks again for your quick and helpful response! – GeorgeF Apr 04 '16 at 11:20
  • Tesseract project does not provide Windows executable for new versions (there have been [requests](https://github.com/tesseract-ocr/tesseract/issues/209) for that); however, older ones can be found at https://sourceforge.net/projects/tesseract-ocr-alt/files/. The tool I mentioned bundles Tesseract training executable with it. – nguyenq Apr 08 '16 at 13:38
  • Indeed! I went to jTessBoxEditor's Sourceforge location, and guess who is the creator of jTessBoxEditor and VietOCR? Your work is simply amazing, sir! Your deep insight is inspiring. k, nuff said. One last thing; can i combine more than 2 traineddata files in code? I'm using android studio. you actually dealt with combining 2 [here](http://stackoverflow.com/questions/17420800/combine-trained-tesseract-files-into-one). I want to train for 5 fonts , and I am using serak after creating box files. Can i combine 5 traineddata files the same way we combine 2? Thanks again. – GeorgeF Apr 18 '16 at 19:58
  • *P.S: I am going to have to mark this question as solved, because you have, for all intents and purposes, provided me every clarification which i needed, with regards to the original question. Thank you very much. – GeorgeF Apr 18 '16 at 20:05