I am working on an app where i need to identify text from an image and what could be the better way than using Tesseract. As Tesseract is an open source and widely accepted. I have used Tesseract in my app. So, i am getting images from user and then applying 2-3 operations on image to improve chances of getting result but i am not getting expected result.
Java Code ->
final Bitmap tessBitmap = Bitmap.createBitmap(image.getWidth(), image.getHeight(), Bitmap.Config.ARGB_8888);
Canvas canvas = new Canvas(tessBitmap);
Paint paint = new Paint();
paint.setColor(Color.BLACK);
canvas.drawBitmap(image, 0, 0, paint);
Mat tessMat = new Mat();
Utils.bitmapToMat(tessBitmap, tessMat);
Imgproc.cvtColor(tessMat, tessMat, Imgproc.COLOR_RGB2GRAY);
Imgproc.threshold(tessMat, tessMat, 0, 255, Imgproc.THRESH_BINARY + Imgproc.THRESH_OTSU);
final Bitmap newTessBitmap = Bitmap.createBitmap(tessMat.width(), tessMat.height(), Bitmap.Config.RGB_565);
Utils.matToBitmap(tessMat, newTessBitmap);
final Bitmap finalTessBitmap = Bitmap.createBitmap(newTessBitmap.getWidth(), newTessBitmap.getHeight(), Bitmap.Config.ARGB_8888);
Canvas tessCanvas = new Canvas(finalTessBitmap);
Paint tessPaint = new Paint();
tessPaint.setColor(Color.BLACK);
tessCanvas.drawBitmap(newTessBitmap, 0, 0, tessPaint);
and then passing this bitmap to tesseract to get output but not getting efficient and sometimes i dont even get anything in output. I have compared my result with one online website https://www.newocr.com/ .
Which is also using tesseract in back end as it is claiming. i have also tried to contact them via email but coudlnt get anything from them.
mTess = new TessBaseAPI();
tessModelPath = Environment.getExternalStoragePublicDirectory(Environment.DIRECTORY_DOWNLOADS).getAbsolutePath() + "/tesseract/";
mTess.init(tessModelPath, "eng", TessBaseAPI.OEM_TESSERACT_ONLY); mTess.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO);
mTess.setImage(finalTessBitmap);
This is the base Tesseract code. Please help me solve my issue. Thanks...
Given below is the image i get after applying above mentioned operation but when i pass it to tesseract i did not get anything but when passing to newocr.com website it is producing exact text.
Result from newOcr.
This image is for results.
Please suggest me about what to do if you have any idea.
After Digging more and running the same image in python code i have found out that in python pytesseract it works like charm and producing exact output as newocr. But when i run in android it doesnt work that well. so may be the issue is with API of Tesseract. So, now if you know anything still which i can do to improve accuracy. Help me. Thanks in Advance.