how to display text recognition bounding box on screen of a ARFrame captured image? (iOS)

Question

I've read ARKit official tutorial RealtimeNumberReader, it uses AVCaptureSession and a specific function layerRectConverted which is only for AVCaptureSession to convert coordinates from bounding box to screen coordinate.

let rect = layer.layerRectConverted(fromMetadataOutputRect: box.applying(self.visionToAVFTransform))

Now I want to recognize text on ARFrame's capturedImage and then display the bound box on screen. Is it possible?

I know how to recognize text on a single image from official tutorial, my problem is how to convert the normalized box coordinate to viewport coordinate.

Please help and thank you very much!!!

score 3 · Answer 1 · answered Apr 01 '21 at 14:12

Based on @Banane42's answer, I found the theory behind ARkit and VNRecognizeTextRequest

ARKit Sceneview's capturedImage is wider than you can see. check the picture below. I made a small app that has an imageView to display the whole image, and the background image is the sceneview area.

The coordinate of sceneview or image is originated from top left corner, x-axis -> to right and y-axis -> to bottom. But the coordinate of boundingBox that VNRequest returns is originated from bottom left corner and x-axis -> to right and y-axis -> to top.
if you use request.regionOfInterest, this ROI should be normalized coordinate with respect to the whole image. the returned VNRequest boundingBox is in normalized coordinate with respect to the ROI box.

Finally I've got my app working properly. And this is very complicated. so be careful!

Banane42 · Accepted Answer · 2021-03-24T16:29:12.353

1

Try looking at this git repo. Having messed with it myself it is not the most performant but this should give you a start.

edited Mar 24 '21 at 16:29

answered Mar 24 '21 at 16:16

Banane42

86
2
8

1

Thank you very much! Based on your example, I've finally figured out the solution. I'll make another answer to explain for other people who may need. – Chengxing Zhang Mar 30 '21 at 14:41

how to display text recognition bounding box on screen of a ARFrame captured image? (iOS)

2 Answers2