So I'm trying to use Vision + CoreML in an app, and yes sure, it works as expected using Apple's Resnet50 Model. However, I wish to only use Vision when the user taps a button. I believe this is the function that can detect what the object is:
private func setupVision() {
guard let model = try? VNCoreMLModel(for: visionClassifier.model) else { return }
request = VNCoreMLRequest(model: model) { (finishedReq, err) in
// Get the results list and the first observation
guard let results = finishedReq.results as? [VNClassificationObservation] else { return }
guard let firstObservation = results.first else { return }
// Format string output
let name: String = firstObservation.identifier
let conf: String = "Confidence: \(firstObservation.confidence * 100)"
// Return the results from the background thread to the main thread
DispatchQueue.main.async {
self.identifier = name
self.confidence = conf
}
}
}
So you might think just call that function inside your view, however, this function is called on the .onAppear()
method:
func prepareCapture() {
setupSession()
setupVision()
startSession()
}
If I take out the setupVision() function, my app will crash at this line:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
// The captured video frame is stored in a CVPixelBuffer object
guard let pixelBuffer: CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
// Use a VNImageRequestHandler to perform the request
try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:]).perform([request])
//This line causes a crash
}
What can I do to implement a snap and scan feature? Should I allow users to take a picture (without presenting it to them) and handle the rest similar to when using CoreML with an ImagePicker?
Any help would be great!