1

I have integrated firebase MLKit in my android application. I'm using the on-device TextRecognizer API to detect the text on the live camera feed. It detects the text but it takes a long time to process the image(from 300 milliseconds up to 1000 milliseconds). Due to large latency, the overlay is not smooth like the Google lens app does.

What can I do so that the detected text overlay gets smooth transition between the frames gets processed in larger latency?

Also, I noticed that google lens app detects the text as a whole sentence instead of showing blocks of the texts. How does google lens app able to detect the text as sentences/paragraphs?

Ben Weiss
  • 17,182
  • 6
  • 67
  • 87
Dharmendra
  • 33,296
  • 22
  • 86
  • 129

1 Answers1

2

I'm assuming you have seen the performance tips in the API docs. One thing that is not mentioned there is that the amount of text in an image has a big impact on the latency. A single line of text, even with the same image resolution, takes much less time to process than the page of a book.

If you don't need to recognize all the text in the camera view, only do the recognition for a small section of the screen. It may help to take a look at the ML Kit Translate Demo with Material Design that makes use of this "trick" to get great performance.

To your second point, Google Lens uses updated text recognition models that do a better job of grouping blocks of text into paragraphs. We hope to adopt these new models in ML Kit soon. In addition we are looking at hardware acceleration to ensure real-time experiences with large amount of text can be achieved.

Chrisito
  • 494
  • 2
  • 4