0

I am using Tess-Two for creating an OCR for Android. I used the same image for conversion, but the result is very different from the tesseract for desktop.

The desktop version of tesseract gives a better result.

I am using the following lines on Android:

  val baseApi = TessBaseAPI()
  baseApi.init(dirPath, "eng")
  baseApi.setImage(mustOpen)
  val recognizedText = baseApi.utF8Text

And on desktop, I am using just this simple command

tesseract image.png result

The sample image is:

this

The output for the image using tesseract for Desktop is:

VEGETABLE OF, RIVET een Sra) SUGAR, EDIBLE

VEGETABLE OIL, INVERT SUGAR S' SUGAR, CITRIC
RAISING 503 (ii), BAKING }, SALT,
SOLIDS (0.6 % [ DL-ACETYL TARTARIC

ACID ESTERS OF ‘AND

And, the output using tess-two for android is this:

'm mm W7 ' ' iii-E:
mmmmfiwgmb Ian»: came
a” ( om | mmmfiéu
mmormuguomws _

Won mm .. . . ml
mumm I'm‘n
( .

Which is very gibberish. Please help.

  • Have you ever figured that out? I am facing the same issue and don't understand how to solve 100%. I improved already by using 2 languages (osd+eng) but still the result is different. What I understood is that tesseract does some preprocessing perhaps this one is missing in Tess two. https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage#simplest-invocation-to-ocr-an-image this helped me come as far as I am now – Hardcore_Graverobber Aug 19 '19 at 12:03

1 Answers1

0

So as I commented on your post and just solved it for me, I thought I share.

The first problem for me was that the image needs to be preprocessed for better results. I'm using OpenCV for the preprocessing. Here https://android.jlelse.eu/a-beginners-guide-to-setting-up-opencv-android-library-on-android-studio-19794e220f3c is a good example how to set it up.

Then the image needs to be switched into a binary image. For me the following gives best results

Mat plateMat = Utils.loadResource(this,R.drawable.plate);
Mat gray = new Mat();
Imgproc.cvtColor(plateMat,gray,Imgproc.COLOR_BGR2GRAY);
Mat blur = new Mat();
Imgproc.GaussianBlur(gray,blur,new Size(3,3),0);
Mat thresh = new Mat();
Imgproc.adaptiveThreshold(blur,thresh,255, Imgproc.ADAPTIVE_THRESH_MEAN_C,Imgproc.THRESH_BINARY_INV,75,10);
Core.bitwise_not(thresh,thresh);
Bitmap bmp = Bitmap.createBitmap(thresh.width(),thresh.height(),Bitmap.Config.ARGB_8888);
Utils.matToBitmap(thresh,bmp);

Then I call Tesseract using the eng+osd language (in this order) you can find them here: https://github.com/tesseract-ocr/tessdata

Then by using tesseract I do this:

TessBaseAPI tesseract = new TessBaseAPI();
tesseract.setDebug(true);
tesseract.init(getFilesDir().getAbsolutePath(),"eng+osd");
tesseract.setImage(bmp);
String utf8 = tesseract.getUTF8Text();

NOW THE REAL DEAL

The real problem why I got a different result in the end is simply because the tesseract version installed with Homebrew on my Mac was 4.1.0 meanwhile the official Tess-two repo still uses 3.05 By digging through the repos issues I found that the developer of Tess two has a new version with Tesseract 4 but it needed to be in a different repo. It is here https://github.com/adaptech-cz/Tesseract4Android

Once I cloned it and used the extracted aar from the project, the results were the same and I can finally sleep in peace!