0

I'm following these instructions for training the Tesseract OCR engine for a new font.

However, when trying to make the box file, I get an error. This is the command I use:

H:\Documents\TesseractTraining>tesseract eng.helvetica.exp0.tif eng.helvetica.exp0   batch.nochop makebox

And here is the error message:

Tesseract Open Source OCR Engine v3.02 with Leptonica
TIFFstream: Sorry, can not handle image.
Unsupported image type.

Some googling suggests that there might be an error with the Leptonica installation. I don't even know if Leptonica is installed on my computer and the webpage is quite confusing with several READMEs (one called "README" and one called "Documentation"), none of them simple enough for me to understand how I would make it work on Windows. I have the Express Edition of Visual Studio 2008, so I can't use the command prompt suggested.

So, my question is: Does anybody know what might be wrong and how I fix it?

Oskar Birkne
  • 803
  • 7
  • 18

1 Answers1

3

Looks like you got a bad image. You can use jTessBoxEditor tool to create TIFF images suitable for training purpose.

nguyenq
  • 8,212
  • 1
  • 16
  • 16
  • Shameless plug, but... I've also created a program that does this (generates box file+TIFF) written in Qt - https://code.google.com/p/tesseract-trainer/ – sashoalm Mar 29 '13 at 09:03
  • jTessBoxEditor does not give a good tiff.. did this solution work for you? any options were added to get the correct tiff file? When i run batch.nochop makebox i get error tessdata_manager.SeekToStart(TESSDATA_INTTEMP):Error:Assert failed:in file ..\..\classify\adaptmatch.cpp, line 555 – blganesh101 Feb 17 '14 at 05:54