I have been using tesseract (tess-two to be more precise)to make an app in android to recognize certain non conventional symbols. The purpose is to identify the symbol and redirect to the description of said symbol.
The symbols can be recognized almost perfectly whether they are alone in the image or they are next to each other... except for two (the ones below).
symbols omitted from recognition
Both of these symbols are not recognized when alone, BUT THEY ARE CORRECTLY RECOGNIZED if they are next to any other symbol.
For example:
Not recognized _
Correctly recognized
_ b
_ y _
Problem is that they are not mismatched with other symbols, but instead they are ignored completely. This occurs to me when calling:
TessBaseAPI baseApi;
...
String text = baseApi.getUTF8Text();
The returned string is always null. Like if it didn't even recognize the black regions to begin with. Anyone knows how I could fix this?
UPDATE:
To make it more clear here is my full code when initializing tess.
TessBaseAPI baseApi = new TessBaseAPI();
mainBitmap = mainBitmap.copy(Bitmap.Config.ARGB_8888, true);
baseApi.setDebug(true);
baseApi.init(MainActivity.DATA_PATH, MainActivity.lang);
baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_SINGLE_CHAR);
baseApi.setVariable("tessedit_char_whitelist","abcdefghijklmnopqrst");
baseApi.setImage(mainBitmap);
mainBitmap.recycle();
mainBitmap = null;
// Iterate through the results.
ResultIterator iterator = baseApi.getResultIterator();
String lastUTF8Text;
float lastConfidence;
iterator.begin();
do {
lastUTF8Text = iterator.getUTF8Text(TessBaseAPI.PageIteratorLevel.RIL_SYMBOL);
lastConfidence = iterator.confidence(TessBaseAPI.PageIteratorLevel.RIL_SYMBOL);
Log.i("string, intConfidence",lastUTF8Text+", "+lastConfidence);
} while (iterator.next(TessBaseAPI.PageIteratorLevel.RIL_SYMBOL));
My whitelist goes from a range of "a" to "t" because I made a font corresponding to the symbols I had to use and mapped them to each one of those letters.