0

Tesseract 5.3.0.20221222

When using command

tesseract.exe 1.png 1 box.train

I get the output

row xheight=25, but median xheight = 16
row xheight=25.5, but median xheight = 16
row xheight=25.5, but median xheight = 16
row xheight=25, but median xheight = 16
row xheight=25.5, but median xheight = 16
row xheight=25.5, but median xheight = 16
row xheight=13, but median xheight = 16
row xheight=12, but median xheight = 16
row xheight=10.4167, but median xheight = 16
row xheight=10.4167, but median xheight = 16
row xheight=12, but median xheight = 16
row xheight=10.4167, but median xheight = 16
row xheight=10.4167, but median xheight = 16
FAIL!
APPLY_BOXES: boxfile line 59/0 ((8,663),(26,695)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 133/0 ((8,460),(26,492)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 143/0 ((458,460),(476,492)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 211/0 ((7,353),(17,372)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 285/0 ((7,213),(17,232)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 295/0 ((277,213),(287,232)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 363/0 ((7,122),(15,137)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 437/0 ((7,9),(15,24)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 447/0 ((217,9),(225,24)): FAILURE! Couldn't find a matching blob
APPLY_BOXES:
   Boxes read from boxfile:     456
   Boxes failed resegmentation:       9
   Found 447 good blobs.
Generated training data for 51 words

The part of error saying "... boxfile line xxx/y", the xxx = the line number, and y = the character in question. It always fail for character "0".

I am using QT Box Editor v1.12rc1 to fix boxes generated by tesseract itself by command

tesseract.exe 1.png 1 batch.nochop makebox

Here is the screenshot of QT Box Editor showing the zeroes: screenshot of QT Box Editor showing correct boxes around all characters, including highlighted one of the zeroes in question

I tried copying the coordinates from the error ((217,9),(225,24)) and pasting it to QT Box Editor (theres a function directly for that), and it draws a box precisely around the zero in question correctly.

I thought maybe it has to do with the fact that the 0 is the first character on the line, so I added second 0-9 digits after the first one in three places. In total, there are 9 zeroes, with 6 of them at the left-most position, and 3 in the middle. All 9 in the picture are throwing error, and no other character is.

Honestly, I have no clue what is happening. Googling this error haven't helped me at all. When I continue with training, the character 0 obviously doesn't get trained and the character is then recognized as 8 most of the time.

What am I doing wrong?

I expect the blob of pixels representing zero to be recognized as such, as it works with every other character.

Maximus
  • 21
  • 4
  • You are running training for the legacy engine (tesseract 3.x). Is it intentional? Also posting the original image (with its box file) would be more helpful than using screenshot from the training tool. – user898678 Feb 04 '23 at 09:30
  • I have not realized that the files distributed with tesseract 5 are for training tesseract 3. Definitely not intentional. So I guess there's really no way to train Tess5 on windows? – Maximus Feb 07 '23 at 09:43
  • What about reading tesseract documentation? – user898678 Feb 08 '23 at 07:04
  • Yes I haven't just spent 2 days googling for a guide how to train it, soaking up everything, thanks. Anyways I found this guide https://www.youtube.com/watch?v=KE4xEzFGSU8 Which led to setting up WLS with ubuntu and fixing "no such command" errors rabithole, fixed few issues in video guide by corelating it with official guide, got it to somewhat train and the result still sucks after training for quite some time. I guess Ill go read more docs or give up, thanks for amazing help – Maximus Feb 09 '23 at 08:39

0 Answers0