2

I'm trying to detect objects and text using the firebase MLKit on a live camera feed in android. There are specific recognizers(FirebaseVisionTextRecognizer, FirebaseVisionObjectDetector) to process the image. If I use these recognizers one by one it is working fine, I'm able to get the desire response.

However, I want to detect both the objects and text simultaneously using the same camera feed same as Google Lens app. To achieve this, First, I tried to run both the recognizers together but there is more latency(Time is taken to perform a specific frame) as both runs sequentially and hence only text detection was working but not the Object detection. That means there is no result from the object detection.

Then, I tried to perform both the recognizers parallel, the latency gets decreased but not enough that the detection API returns the response. When there is no text in the camera feed, the object detection works well but when there is text in the camera feed, the latency is getting increased and so there are no track objects.

Note: I checked the latency of the after detection function call(Code which executes after detecting the object) and it doesn't take much time. The recognisers take more time to process the image in case of parallel execution. I'm testing on Samsung Galaxy S30s phone and I guess it has not that much poor processor.

Few outline from the code:

  1. Using FirebaseVisionObjectDetectorOptions.STREAM_MODE, enableMultipleObjects=false and enableClassification=false for object detection
  2. Using FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21 format while building FirebaseVisionImageMetadata
  3. As per best practices defined by Google, dropping the latest frames if the detection is in process
  4. Using OnDeviceObjectDetector for the object detection
  5. For text detection, I use OnDeviceTextRecognizer

I need help to understand how the Google Lens app performs multiple recognisers together but not in my application. What I can do to enable multiple recognizers on the same camera frame?

Ben Weiss
  • 17,182
  • 6
  • 67
  • 87
Dharmendra
  • 33,296
  • 22
  • 86
  • 129

1 Answers1

2

For now, the way to run multiple detectors on the same image frame is to run them sequentially, because we internally run them in a single thread. We are actively adding supports for running different detectors in parallel.

...as both runs sequentially and hence only text detection was working but not the Object detection.

The ObjectDetection feature with STREAM_MODE expects the latency between two image frames is small, say < 300ms. If you run text recognition in between, the latency may be too long so that the ObjectDetection feature can not function properly. You may change the STREAM_MODE to SINGLE_IMAGE_MODE to get result in your setting, but the latency would be higher.

Shiyu
  • 875
  • 4
  • 5
  • How Google Lens app managed to perform multiple scans? Also, how the detection is that much smooth? In text detection, my latency goes to 700ms that is not rending the graphic overlay that smooth. – Dharmendra May 04 '20 at 08:13
  • Text detection needs 700ms is too high in my understanding, what is your testing device, image resolution and image format? – Shiyu May 04 '20 at 16:28
  • Its 640x480 with ImageFormat.NV21 image format. I'm testing on Samsung galaxy s30s phone and the latency range from 400 milliseconds to 800 milliseconds. Tried this with just text detection processor. @Shiyu – Dharmendra May 05 '20 at 14:12
  • I guess Google Lens may run different detectors in parallel, which can not be achieved in current ML Kit versions. Are you using Camera1? In my Pixel 4 XL device, I could run text detection around 80ms with similar format and image size. – Shiyu May 08 '20 at 00:13
  • Also, are you doing image throttling? say drop the new frames when the previous frame is not finished? – Shiyu May 08 '20 at 00:13
  • Yes, I'm dropping frames if the previous is not finished. – Dharmendra May 08 '20 at 11:11
  • Could you try Text recognition with ML Kit sample app to see the latency you can get? https://github.com/firebase/quickstart-android/tree/master/mlkit – Shiyu May 08 '20 at 17:07