5

I am trying to combine CoreML and ARKit in my project using the given inceptionV3 model on Apple website.

I am starting from the standard template for ARKit (Xcode 9 beta 3)

Instead of intanciating a new camera session, I reuse the session that has been started by the ARSCNView.

At the end of my viewDelegate, I write:

sceneView.session.delegate = self

I then extend my viewController to conform to the ARSessionDelegate protocol (optional protocol)

// MARK: ARSessionDelegate
extension ViewController: ARSessionDelegate {

    func session(_ session: ARSession, didUpdate frame: ARFrame) {

        do {
            let prediction = try self.model.prediction(image: frame.capturedImage)
            DispatchQueue.main.async {
                if let prob = prediction.classLabelProbs[prediction.classLabel] {
                    self.textLabel.text = "\(prediction.classLabel) \(String(describing: prob))"
                }
            }
        }
        catch let error as NSError {
            print("Unexpected error ocurred: \(error.localizedDescription).")
        }
    }
}

At first I tried that code, but then noticed that inception requires a pixel Buffer of type Image. < RGB,<299,299>.

Although not recommenced, I thought I would just resize my frame then try to get a prediction out of it. I am resizing using this function (took it from https://github.com/yulingtianxia/Core-ML-Sample)

func resize(pixelBuffer: CVPixelBuffer) -> CVPixelBuffer? {
    let imageSide = 299
    var ciImage = CIImage(cvPixelBuffer: pixelBuffer, options: nil)
    let transform = CGAffineTransform(scaleX: CGFloat(imageSide) / CGFloat(CVPixelBufferGetWidth(pixelBuffer)), y: CGFloat(imageSide) / CGFloat(CVPixelBufferGetHeight(pixelBuffer)))
    ciImage = ciImage.transformed(by: transform).cropped(to: CGRect(x: 0, y: 0, width: imageSide, height: imageSide))
    let ciContext = CIContext()
    var resizeBuffer: CVPixelBuffer?
    CVPixelBufferCreate(kCFAllocatorDefault, imageSide, imageSide, CVPixelBufferGetPixelFormatType(pixelBuffer), nil, &resizeBuffer)
    ciContext.render(ciImage, to: resizeBuffer!)
    return resizeBuffer
} 

Unfortunately, this is not enough to make it work. This is the error that is catched:

Unexpected error ocurred: Input image feature image does not match model description.
2017-07-20 AR+MLPhotoDuplicatePrediction[928:298214] [core] 
    Error Domain=com.apple.CoreML Code=1 
    "Input image feature image does not match model description" 
    UserInfo={NSLocalizedDescription=Input image feature image does not match model description, 
    NSUnderlyingError=0x1c4a49fc0 {Error Domain=com.apple.CoreML Code=1 
    "Image is not expected type 32-BGRA or 32-ARGB, instead is Unsupported (875704422)" 
    UserInfo={NSLocalizedDescription=Image is not expected type 32-BGRA or 32-ARGB, instead is Unsupported (875704422)}}}

Not sure what I can do from here.

If there is any better suggestion to combine both, I'm all ears.

Edit: I also tried the resizePixelBuffer method from the YOLO-CoreML-MPSNNGraph suggested by @dfd , the error is exactly the same.

Edit2: So I changed the pixel format to be kCVPixelFormatType_32BGRA (not the same format as the pixelBuffer passed in the resizePixelBuffer).

let pixelFormat = kCVPixelFormatType_32BGRA // line 48

I do not have the error anymore. But as soon as I try to make a prediction, the AVCaptureSession stops. Seems I am running into the same issue Enric_SA is running on the apple developers forum.

Edit3: So I tried implementing rickster solution. Works well with inceptionV3. I wanted to try a a feature observation (VNClassificationObservation). At this time, it is not working using TinyYolo. The bounding are wrong. Trying to figure it out.

Swift Rabbit
  • 1,370
  • 3
  • 14
  • 29
  • By far, the recommendation I'd give is a blog (http://machinethink.net/blog/yolo-coreml-versus-mps-graph/) and associated repo (https://github.com/hollance/YOLO-CoreML-MPSNNGraph). Yes, this is as much about YOLO as about CoreML (and quite deep on ML), but the code has a good way to resize a UIImage as 224x224 and return a CVPixelBuffer. I commented on another question here about 2 days ago - you might find that code by searching on "pixel buffer" and "Swift", sorting by question date. –  Jul 20 '17 at 18:15
  • @dfd Tried it, the result (error) is the same. The project directly uses a camera session, it does not try to reuse the ARSCNView session. – Swift Rabbit Jul 20 '17 at 18:58

1 Answers1

8

Don't process images yourself to feed them to Core ML. Use Vision. (No, not that one. This one.) Vision takes an ML model and any of several image types (including CVPixelBuffer) and automatically gets the image to the right size and aspect ratio and pixel format for the model to evaluate, then gives you the model's results.

Here's a rough skeleton of the code you'd need:

var request: VNRequest

func setup() {
    let model = try VNCoreMLModel(for: MyCoreMLGeneratedModelClass().model)
    request = VNCoreMLRequest(model: model, completionHandler: myResultsMethod)
}

func classifyARFrame() {
    let handler = VNImageRequestHandler(cvPixelBuffer: session.currentFrame.capturedImage,
        orientation: .up) // fix based on your UI orientation
    handler.perform([request])
}

func myResultsMethod(request: VNRequest, error: Error?) {
    guard let results = request.results as? [VNClassificationObservation]
        else { fatalError("huh") }
    for classification in results {
        print(classification.identifier, // the scene label
              classification.confidence)
    }
}

See this answer to another question for some more pointers.

rickster
  • 124,678
  • 26
  • 272
  • 326
  • Hello @rickster. It works fine with VNClassificationObservation, but I am having issues with VNCoreMLFeatureValueObservation straight from the tiny-yolo example (https://github.com/hollance/YOLO-CoreML-MPSNNGraph). It's like my bounding boxes are not having the right size. Pretty weird, but I can't figure why yet. – Swift Rabbit Jul 25 '17 at 18:30
  • 1
    Hard to say without more detailed diagnostics (and it's a bit outside my bailiwick even then), but if I had to hazard a guess it might be something about pixel dimensions vs normalized image coordinates vs crop/scale option, with some possibility of video clean aperture also throwing things off. – rickster Aug 01 '17 at 22:32
  • @ina : I haven't given much time into it since then. But my most educated guess is that casting the request results as? [VNCoreMLFeatureValueObservation] is not parsing the multiArrayValue of the observations the way it should. That would explain why the bounding boxes are off. I am not certain though, I would have to run the array through a matrix or something like that. – Swift Rabbit Aug 16 '17 at 19:12