I'm trying to detect objects and text using the firebase MLKit on a live camera feed in android. There are specific recognizers(FirebaseVisionTextRecognizer, FirebaseVisionObjectDetector) to process the image. If I use these recognizers one by one it is working fine, I'm able to get the desire response.
However, I want to detect both the objects and text simultaneously using the same camera feed same as Google Lens app. To achieve this, First, I tried to run both the recognizers together but there is more latency(Time is taken to perform a specific frame) as both runs sequentially and hence only text detection was working but not the Object detection. That means there is no result from the object detection.
Then, I tried to perform both the recognizers parallel, the latency gets decreased but not enough that the detection API returns the response. When there is no text in the camera feed, the object detection works well but when there is text in the camera feed, the latency is getting increased and so there are no track objects.
Note: I checked the latency of the after detection function call(Code which executes after detecting the object) and it doesn't take much time. The recognisers take more time to process the image in case of parallel execution. I'm testing on Samsung Galaxy S30s phone and I guess it has not that much poor processor.
Few outline from the code:
- Using FirebaseVisionObjectDetectorOptions.STREAM_MODE, enableMultipleObjects=false and enableClassification=false for object detection
- Using
FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21
format while building FirebaseVisionImageMetadata - As per best practices defined by Google, dropping the latest frames if the detection is in process
- Using OnDeviceObjectDetector for the object detection
- For text detection, I use OnDeviceTextRecognizer
I need help to understand how the Google Lens app performs multiple recognisers together but not in my application. What I can do to enable multiple recognizers on the same camera frame?