Apple Vision VNDetectRectanglesRequest Returns Rectangles Not Tightly Fitting Detected Rectangle

Question

I am in the process of writing an algorithm to carve out rectangular shapes for processing in iOS by using Apple's Vision Framework. The VNDetectRectangles seems to mostly work and it does accurately detect the shapes in question, but the four corner points regardless of how clean the shape is, always seem to leave a margin of padding between the detected contour and the actual shape.

It does this for real-world shapes and the exact same happens on an experimental shape I drew out below. It illustrates how the detector leaves margin between the detected shape in red and the actual shape.

I was wondering if anyone with more experience with this framework can elaborate on whether or not these detected shapes can be "tightened-up." My first thought is the margins look about even on all sides, so I could hack in a transform that shrinks the detected rectangle by some constant pad, but that's obviously a dirty solution likely to break whenever they alter the framework.

Hoping others have corrected for this before, as the carved out shapes put through a perspective transform come out warped due to this padding, and I imagine this Vision algorithm is meant for carving out subimages for processing.

I used the following parameters for the rectangle detect request. I looked through all the documentation and could not find a reference to any tolerance or padding that would tune the algorithm to more tightly wrap the detected shape.

rectDetectRequest.maximumObservations = 10
rectDetectRequest.quadratureTolerance = 15.0
rectDetectRequest.minimumConfidence = 0.6
rectDetectRequest.minimumAspectRatio = 0.8

I had success in the past in OpenCV using contour detection to accomplish the same thing, but this Vision framework is very clean and performant on the device and I would much rather stick to this. Any insight would be much appreciated Experimental Shape

Since I see only one image, I don't know for sure, but it could be one of 2 issues: maybe you are misunderstanding what is returned: it returns "<...>normalized coordinates of **bounding boxes containing the rectangle**". So it's not supposed to touch rectangle itself. It returns a tight bounding box around it. So I would say on your image it found a perfectly tight bounding box. But if it's really incorrect, then it might have to do with how you are converting normalized coordinates to a displayed rectangle. From your image though, it doesn't appear to be the case. — timbre timbre, Mar 26 '22 at 23:25
Also "perspective transform come out warped due to this padding": well, how precise your perspective transform has to be? I do perspective transform too, and have no issues with this bounding box being few pixels bigger. Another possibility because of very confusing coordinates system, it may be that you are not applying perspective transform correctly. For example I have to transpose Y coordinate to get the right coordinates, otherwise the upside down or mirrored version of the image will be produced (but it depends on what you do with the image too). — timbre timbre, Mar 27 '22 at 00:02
Thanks for the feedback! I saw that the returned rectangle observation was in a normalized space and I believe I have the conversion happening correctly when I draw it on the actual image after I invert the Y coord. It seems to match the same rect I get when I do a perspective transform using the rect on the image within the Vision normalized space. My algo further subdivides into many sections, so the padding really throws off the alignment. But given how performant this Vision API is, I am wondering if that padding is constant (and I can just trim) or if there's a better approach. — lepapillon, Mar 27 '22 at 02:36
I don't have formula for padding. You will have ti experiment I guess... Or if you wanted to be really precise, you could use Canny algorithm to "highlight" edges within the image, and then search for edges inside and closest to the detected bounding box to identify where the rectangle borders start. This is probably not for a real-time video processing (I think it will be too heavy), but it would work for static image processing. — timbre timbre, Mar 28 '22 at 16:01
Yeah, I will run a few tests and see if there's a constant to the padding. I'll post on here in case the results are interesting for you and others. I used to do this with contour detection in OpenCV, so like you mentioned, perhaps running the contour algo on the cropped rectangle will work well to remove any border. — lepapillon, Mar 29 '22 at 00:13

Apple Vision VNDetectRectanglesRequest Returns Rectangles Not Tightly Fitting Detected Rectangle

0 Answers0