0

I am using Tesseract (tess-two) library in my Android Application for real time text detection. My code :

public void onPreviewFrame(byte[] data, Camera camera) {
   try  {
     Camera.Size previewSize =camera.getParameters().getPreviewSize();
     YuvImage yuvimage=new YuvImage(data, ImageFormat.NV21, previewSize.width, previewSize.height, null);
     ByteArrayOutputStream baos = new ByteArrayOutputStream();
     yuvimage.compressToJpeg(new Rect(0, 0, previewSize.width, previewSize.height), 60, baos);
     byte[] jdata = baos.toByteArray();

     BitmapFactory.Options options = new BitmapFactory.Options();
     options.inSampleSize = 4;
     Bitmap bmp = BitmapFactory.decodeByteArray(jdata, 0, jdata.length);

     TessBaseAPI baseApi = new TessBaseAPI();
     baseAPI.init(DATA_PATH, lang);       
     baseAPI.setImage(bmp);
     extractedText = baseAPI.getUTF8Text();
     DisplayResult.setText(extractedText);
   }
   catch(Exception e) {
     e.printStackTrace();
   }

I have no problem in Tesseract Initialisation as well as setting Image. But the output is completely wrong, take a look at the image. The textview displays the tesseract output(On top of surfaceview).

Tesseract Output

How do I solve this problem?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    Welcome to Tesseract! ;) It's like this, to be honest. You can try processing the image and adjusting the quality, try converting to black and white, try each PSM mode, etc. If I crop the image you provided to just the text, I get the text "MADE IN CHINA" perfectly, but the entire image Tesseract just cannot manage. [See more tips here](https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality), but don't hold your breath unless you can crop that image. – samiles May 09 '17 at 09:17
  • You need to pre-process the image to reduce the uneven illumination. – rmtheis Jul 13 '17 at 12:55

1 Answers1

1

A few things that might help with the optimization of your output:

  • cropping the image to the desired text area before processing the output
  • exclude punctuation and other characters from the text processing
TheMetal
  • 11
  • 1