The accuracy of character recognition in my tess4j OCR application is very low. I have heard that turning off the dictionary in tess4j will increase the accuracy by letting individual characters to be recognized. But i don't know how to do it. Anyone know how to turn off the dictionary in tess4j?
Asked
Active
Viewed 2,038 times
0
-
Hi did you solve the problem? I have almost the same problem which I want to use Regex in order to improve the accuracy of the recognition, for example the text in image is fixed text `\d\d\w\w\d\d`, how can do it in tessertact in Java – Bahramdun Adil Jan 04 '16 at 06:57
1 Answers
2
As follows:
TessBaseAPISetVariable(handle, "load_system_dawg", "F");
TessBaseAPISetVariable(handle, "load_freq_dawg", "F");
or
setTessVariable("load_system_dawg", "F");
setTessVariable("load_freq_dawg", "F");
Update:
Put the following in a file named, for example, bazaar
placed under configs
folder:
load_system_dawg F
load_freq_dawg F
and then pass the name of the file to the appropriate method:
List<String> configs = Arrays.asList("bazaar");
instance.setConfigs(configs);
References:
https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc
http://tess4j.sourceforge.net/docs/docs-1.4/

nguyenq
- 8,212
- 1
- 16
- 16
-
I think you also have to provide empty user_words_suffix and user_pattern_suffix files – sliders_alpha Feb 05 '15 at 13:53
-
Where is this `TessBaseAPISetVariable ` coming from? What package? – Francisco Souza Sep 16 '19 at 13:24