0

The accuracy of character recognition in my tess4j OCR application is very low. I have heard that turning off the dictionary in tess4j will increase the accuracy by letting individual characters to be recognized. But i don't know how to do it. Anyone know how to turn off the dictionary in tess4j?

Chalaka Ellawala
  • 405
  • 3
  • 5
  • 14
  • Hi did you solve the problem? I have almost the same problem which I want to use Regex in order to improve the accuracy of the recognition, for example the text in image is fixed text `\d\d\w\w\d\d`, how can do it in tessertact in Java – Bahramdun Adil Jan 04 '16 at 06:57

1 Answers1

2

As follows:

TessBaseAPISetVariable(handle, "load_system_dawg", "F");
TessBaseAPISetVariable(handle, "load_freq_dawg", "F");

or

setTessVariable("load_system_dawg", "F");
setTessVariable("load_freq_dawg", "F");

Update:

Put the following in a file named, for example, bazaar placed under configs folder:

load_system_dawg     F
load_freq_dawg       F

and then pass the name of the file to the appropriate method:

List<String> configs = Arrays.asList("bazaar");
instance.setConfigs(configs);

References:
https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc
http://tess4j.sourceforge.net/docs/docs-1.4/

nguyenq
  • 8,212
  • 1
  • 16
  • 16