How can I define the font type for tesseract to use in recognition (not in training)?

Question

For the downloadable English dataset I do

cat tessdata/eng.* | egrep -o ".*ttf" | sort -u

and get a list of all fonts that were used in the training of the English language

Andale_Mono.ttf
Arial_Black.ttf
Arial_Bold.ttf
Arial.ttf
buttf
Comic_Sans_MS_Bold.ttf
Comic_Sans_MS.ttf
Courier_New_Bold.ttf
Courier_New.ttf
Georgia_Bold.ttf
Georgia.ttf
Gottf
Impact.ttf
Times_New_Roman_Bold.ttf
Times_New_Roman.ttf
Trebuchet_MS_Bold.ttf
Trebuchet_MS.ttf
ttf
Verdana_Bold.ttf
Verdana.ttf

Now I want to recognize a text where I already know the fonttype, so I want to limit the recognition on that. I tried:

api.SetVariable("classify_font_name", "Arial_Bold.ttf");

but I don't see a better result. Can someone tell me how to do this or if it is even possible?

score -1 · Answer 1 · edited Feb 13 '16 at 01:35

-1

You can use LTRResultIterator class and its WordFontAttributes method to obtain the font info of the results at word or character level. Once you get the font attributes, you can then filter the output text based on specific font name criteria. See Tesseract API examples.

edited Feb 13 '16 at 01:35

Civilian

614
2
9
29

answered May 03 '14 at 00:47

nguyenq

8,212
1
16
16

1

The question was how to choose a specific font for recognition and using only traineddata of that single font. – Jakob Kroeker Jul 29 '16 at 08:44
1

I misread the question. See http://stackoverflow.com/questions/13154150/explicitly-set-the-font-to-be-used-for-recognition-by-tesseract-ocr?rq=1 – nguyenq Aug 14 '16 at 14:39

How can I define the font type for tesseract to use in recognition (not in training)?

1 Answers1