What are the input image size requirements for CoreML Vision face detection

Question

I'm working with Vision framework to detect faces in images. I couldn't find in the Apple's documentation what are the input image requirements. Usually when working with a machine learning model, and particularly with .mlmodel in CoreML, it describes the required input. For example Image (Color 112 x 112).

let image: UIImage = someUIImage()    
let handler = VNImageRequestHandler(ciImage: CIImage(cgImage: (image?.cgImage)!))
let faceRequest = VNDetectFaceLandmarksRequest(completionHandler: { (request: VNRequest, error: Error?) in
    guard let observations = request.results as? [VNFaceObservation]
    else {
        print("unexpected result type from VNFaceObservation")
                return
        }
    self.doSomething(with observations: observations)
})

do {
    try handler.perform([faceRequest])
} catch {
    print("Face detection failed: \(error)")
}

score 1 · Answer 1 · answered Sep 17 '19 at 15:27

1

It doesn't matter, Vision automatically takes care of this. (It may or may not use a machine learning model under the hood.)

You do need to make sure the entire face / head is visible in the image or the face detector won't work very well.

It's also possible it won't work well with really small images but I've never tried this.

answered Sep 17 '19 at 15:27

Matthijs Hollemans

7,706
2
16
23

Im fetching images from the camera roll. I guess it won't work well with the smallest available image in the cache 32x32 and on the other hand full sized image is also unneeded. If they were mentioning the suggested or the minimal size, I would know what to fetch. Now I'm just guessing. – Sanich Sep 17 '19 at 17:02
I use it on 1080x1920 images from the camera without problems. I'm sure Vision internally resizes the images to whatever size it prefers. – Matthijs Hollemans Sep 17 '19 at 18:46
Its is critical if you want to process all photos in the camera roll (10k photos). Im fetching 224x224 with `.fasFormat` delivery mode. Im getting degraded 90x120 and this is the input for the VNRequest and it works good. But that's kind of guessing – Sanich Sep 17 '19 at 20:41
I guess you could set a breakpoint and then use the debugger to step through what Vision is doing. ;-) There is bound to be a call to `vImageScale_xxx()` in there somewhere. – Matthijs Hollemans Sep 18 '19 at 08:53
I've read their paper describing what they are doing. The uses convulution in the first layers meaning you can give any input size. Maybe its kind of new concept, that the better the input, the better (occurate) the output. And not how usually models are trained on a fixed input size – Sanich Sep 18 '19 at 09:00

What are the input image size requirements for CoreML Vision face detection

1 Answers1