21

I'm trying to train Tesseract for a new font which can be used in my Android app. I need to train for digits only, so I had created one training image, box file and unicharset file.

I have followed the training instructions, but when I tried to run tesseract it says, bad read of inttemp!.

What am I doing wrong? How can I diagnose this error?

Shog9
  • 156,901
  • 35
  • 231
  • 235
Dipin
  • 1,085
  • 6
  • 19
  • 1
    Training is quite painful. Carefully examine the logs of your training workflow for Warnings and Errors. If something goes wrong your trainingdata is useless. – n3utrino Feb 13 '13 at 14:20
  • @gabe, can you suggest any helpful links? – Dipin Feb 14 '13 at 04:12
  • I spent alot of time in https://groups.google.com/forum/?fromgroups=#!forum/tesseract-ocr maybe this is something for you https://gitorious.org/ancient-greek-training-for-tesseract/tesstrainingtools – n3utrino Feb 14 '13 at 10:57
  • thanx gabe. i will look in to this :) – Dipin Feb 14 '13 at 11:10
  • it is for tesseract 3 but maybe it helps http://michaeljaylissner.com/blog/adding-new-fonts-to-tesseract-3-ocr-engine – n3utrino Feb 15 '13 at 09:56
  • http://vietocr.sourceforge.net/training.html a box editor maybe of use – n3utrino Feb 15 '13 at 10:39

1 Answers1

1

http://code.google.com/p/tesseract-ocr/issues/detail?id=155

Turns out, tesseract was still going back to the "C:\Program Files\Tesseract-OCR" folder - including using the 3.0 training exes within the training folder there. It made no difference where I was running the command from - guess tesseract ignores that when it has a path variable.

Replaced all of the 3.0 exe and training files in that folder. Dropped in the 2.0.4 files and the extract command worked!!! I should have solved the problem faster for all sorts of reasons, but..

try this

http://www.win.tue.nl/~aeb/linux/ocr/tesseract.html