How to turn off the dictionary in tess4j?

Question

The accuracy of character recognition in my tess4j OCR application is very low. I have heard that turning off the dictionary in tess4j will increase the accuracy by letting individual characters to be recognized. But i don't know how to do it. Anyone know how to turn off the dictionary in tess4j?

Hi did you solve the problem? I have almost the same problem which I want to use Regex in order to improve the accuracy of the recognition, for example the text in image is fixed text `\d\d\w\w\d\d`, how can do it in tessertact in Java — Bahramdun Adil, Jan 04 '16 at 06:57

nguyenq · Answer 1 · 2019-10-02T23:29:06.397

2

As follows:

TessBaseAPISetVariable(handle, "load_system_dawg", "F");
TessBaseAPISetVariable(handle, "load_freq_dawg", "F");

or

setTessVariable("load_system_dawg", "F");
setTessVariable("load_freq_dawg", "F");

Update:

Put the following in a file named, for example, bazaar placed under configs folder:

load_system_dawg     F
load_freq_dawg       F

and then pass the name of the file to the appropriate method:

List<String> configs = Arrays.asList("bazaar");
instance.setConfigs(configs);

References:
https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc
http://tess4j.sourceforge.net/docs/docs-1.4/

edited Oct 02 '19 at 23:29

answered Oct 20 '14 at 23:39

nguyenq

8,212
1
16
16

I think you also have to provide empty user_words_suffix and user_pattern_suffix files – sliders_alpha Feb 05 '15 at 13:53
Where is this `TessBaseAPISetVariable ` coming from? What package? – Francisco Souza Sep 16 '19 at 13:24

How to turn off the dictionary in tess4j?

1 Answers1