Using OCR mobile vision to anchor image to detected text

Question

I am using the Text Recognition (mobile vision/ML) by Google to detect text on Camera feed. Once I detect text and ensure it is equal to "HERE WE GO", I draw a heart shape beside the detected text using the passed boundries.

The problem I am facing that the shape is jumping and lagging behind. I want it more like Anchored to the detected text. Is there something I can do to improve that?

I heard about ArCore library but it seems it is based on existing images to determine the anchor however in my case it can be any text that matches "HERE WE GO".

Any suggestions ?

Is the text on an object in the video, like a poster or drink bottle etc in the video, or is it standalone text like subtitles? — Mick, Jun 10 '19 at 11:20
I am referring to text on live camera feed.. Say pointing your camera at a poster — Snake, Jun 11 '19 at 20:57

score 1 · Accepted Answer · answered Jun 10 '19 at 17:39

1

I believe you are trying to overlay text on the camera preview in realtime. There will be small delay between the camera input and detection. Since the API is async by the time the output returns you would be showing another frame. To alleviate that you can either make the processing part sync with using some lock/mutex or overlay another image that only refreshes after the processing is done. We have some examples here: https://github.com/firebase/quickstart-android/tree/master/mlkit

and also I fixed a similar problem on iOS by using DispatchGroup https://github.com/googlecodelabs/mlkit-ios/blob/master/translate/TranslateDemo/CameraViewController.swift#L245

answered Jun 10 '19 at 17:39

Ibrahim Ulukaya

12,767
1
33
36

Hi Ibrahim, Can you please point me to which example alleviate the delay? I actually used the text recognition project as my sample but as you move the camera the text jumps as it follows the boundries detected. I thought AR super impose images that are anchored to location and moves together. – Snake Jun 11 '19 at 20:56
any comment? Thanks – Snake Jun 12 '19 at 19:30
Yes, there is a choice to make between camera frames displayed in real time, and tracking that lacks behind ; or doing sync rendering and having the camera frames shown with some delay. – rds Jun 14 '19 at 14:56
@Ibrahim, are you saying that this exists ? In sample code because when I run the sample code in the link above the detections jump and lag behind . If you are suggesting a way then is there a code sample to demonstrate that ie the sync and rendering ? The Google lens app for example anchors a dot to a detected text and i am not sure how to do that – Snake Jun 15 '19 at 02:54
Hi Ibrahim, I awarded you the bounty being closest answer but I really appreciate more info/example as per my last example – Snake Jun 16 '19 at 04:40

score 1 · Answer 2 · answered Jun 14 '19 at 03:28

Option 1: Refer tensor flow android sample here https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android

especially these classes: 1. Object tracker: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/android/src/org/tensorflow/demo/tracking/ObjectTracker.java

2.Overlay https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/android/src/org/tensorflow/demo/OverlayView.java

3.Camera Activity and Camera Fragment https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/android/src/org/tensorflow/demo/CameraActivity.java

Option 2: A sample code can be found in below code lab. They are doing something similar for barcode.

https://codelabs.developers.google.com/codelabs/barcodes/index.html?index=..%2F..index#0

Thank you but that seems to answer a different question. I am not asking how to track or detect. I am asking how to anchor the tracking in realtime without lagging displayed content behind liken AR — Snake, Jun 15 '19 at 02:52

Using OCR mobile vision to anchor image to detected text

2 Answers2