0

I have an app which does real time filtering on camera feed, i'm getting each frame from camera and then do some filtering using CIFilter and then pass the final frame(CIImage) to MTKView to be shown on my swiftUI view, it works fine, but when i want to do face/body detection, real time, on camera feed, frame rate goes down to 8 frames per second and super laggy. i tried anything i could find on the internet, using vision, CIDetector, CoreML, everything is the same result, well, i would do this on global thread, which makes the UI responsive but the feed which i'm showing into the main view is still laggy, but things like scrollview are working fine. so i tried to change the view from MTKView to UIImageView, Xcode shows its rendering at 120FPS (which i dont understand why, its 30FPS when not using any face detection) but the feed is still laggy, cannot keep up somehow to the output frame rate, i'm new to this, i dont understand why is it like that. i also tried just to pass the coming image to MTKView (without any filtering in between, with face detection) also the same laggy result, without face detection, it goes to 30FPS (why not 120?). this is the code i'm using for converting sampleBuffer to ciImage

extension CICameraCapture: AVCaptureVideoDataOutputSampleBufferDelegate {
  func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {

    guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }

    var ciImage = CIImage(cvImageBuffer: imageBuffer)

    if self.cameraPosition == AVCaptureDevice.Position.front {
        ciImage = ciImage.oriented(.downMirrored)
    }
    ciImage = ciImage.transformed(by: CGAffineTransform(rotationAngle: 3 * .pi / 2))
    ciImage = ciImage.transformToOrigin(withSize: ciImage.extent.size)
    detectFace(image: ciImage) // this is for detecting face realtime, i have done it in vision 
        //and also cidetector - cidetector is a little bit faster when setted to low accuracy 
       //but still not desired result(frame rate)

    DispatchQueue.main.async {
        self.callback(ciImage)
    }

  }
}  

and this is the MTKView code, which is very simple and basic implementation of it:

import MetalKit
import CoreImage

class MetalRenderView: MTKView {
    //var textureCache: CVMetalTextureCache?

  override init(frame frameRect: CGRect, device: MTLDevice?) {
    super.init(frame: frameRect, device: device)

    if super.device == nil {
      fatalError("No support for Metal. Sorry")
    }

    framebufferOnly = false
    preferredFramesPerSecond = 120
    sampleCount = 2
  }

  required init(coder: NSCoder) {
    fatalError("init(coder:) has not been implemented")
  }

  private lazy var commandQueue: MTLCommandQueue? = {
    [unowned self] in
    return self.device!.makeCommandQueue()
  }()

  private lazy var ciContext: CIContext = {
    [unowned self] in
    return CIContext(mtlDevice: self.device!)
  }()

  var image: CIImage? {
    didSet {
        renderImage()
    }
  }

  private func renderImage() {
    guard var image = image else { return }
    image = image.transformToOrigin(withSize: drawableSize) // this is an extension to resize 
          //the image to the render size so i dont get the render error while rendering a frame

    let commandBuffer = commandQueue?.makeCommandBuffer()
    let destination = CIRenderDestination(width: Int(drawableSize.width),
                                          height: Int(drawableSize.height),
                                          pixelFormat: .bgra8Unorm,
                                          commandBuffer: commandBuffer) { () -> MTLTexture in
                                            return self.currentDrawable!.texture
    }

    try! ciContext.startTask(toRender: image, to: destination)

    commandBuffer?.present(currentDrawable!)
    commandBuffer?.commit()
    draw()

  }

}

and here is the code for face detection using CIDetector:

func detectFace (image: CIImage){
    //DispatchQueue.global().async {

            let options = [CIDetectorAccuracy: CIDetectorAccuracyHigh,
                           CIDetectorSmile: true, CIDetectorTypeFace: true] as [String : Any]

            let faceDetector = CIDetector(ofType: CIDetectorTypeFace, context: nil, 
                                          options: options)!

            let faces = faceDetector.features(in: image)

            if let face = faces.first as? CIFaceFeature {

                AppState.shared.mouth = face.mouthPosition
                AppState.shared.leftEye = face.leftEyePosition
                AppState.shared.rightEye = face.rightEyePosition
            }

    //}
}

what I have tried

1) different face detection methods, using Vision, CIDetector and also CoreML(this one not very deeply as i dont have experience in it) I would get the detection info, but frame rate is 8 or at the best case its 15 (which would be a delayed detection)

2) I've read somewhere that it might be result of the image colorsapce so i have tried different video setting and different rendering colorspace, still no change in the frame rate.

3) I'm somehow sure that it might be regarding to pixelbuffer release time, so i deep copied the imageBuffer and pass it to the detection, beside some memory issues it went up to 15 FPS, but still not minimum 30FPS. in here i also tried to convert imageBuffer to ciimage and then render ciimage to cgimage and the back to ciimage to just release the buffer, but also could not get more than 15FPS (well on average, sometimes goes to 17 or 19, but still laggy)

i'm new in this and still trying to figure it out, i would appreciate any suggestions, samples or tips that could direct me to a better path of solving this.

update

this is the camera capture setup code:

class CICameraCapture: NSObject {
    typealias Callback = (CIImage?) -> ()
    private var cameraPosition = AVCaptureDevice.Position.front
    var ciContext: CIContext?
    let callback: Callback
    private let session = AVCaptureSession()
    private let sampleBufferQueue = DispatchQueue(label: "buffer", qos: .userInitiated)//, attributes: [], autoreleaseFrequency: .workItem)
    // face detection
    //private var sequenceHandler = VNSequenceRequestHandler()
    //var request: VNCoreMLRequest!
    //var visionModel: VNCoreMLModel!
    //let detectionQ = DispatchQueue(label: "detectionQ", qos: .background)//, attributes: [], autoreleaseFrequency: .workItem)

    init(callback: @escaping Callback) {
        self.callback = callback
        super.init()
        prepareSession()
        ciContext = CIContext(mtlDevice: MTLCreateSystemDefaultDevice()!)
    }

    func start() {
        session.startRunning()
    }

    func stop() {
        session.stopRunning()
    }


  private func prepareSession() {
    session.sessionPreset = .high //.hd1920x1080
    let cameraDiscovery = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInDualCamera, .builtInWideAngleCamera], mediaType: .video, position: cameraPosition)
    guard let camera = cameraDiscovery.devices.first else { fatalError("Can't get hold of the camera") }
    //try! camera.lockForConfiguration()
    //camera.activeVideoMinFrameDuration = camera.formats[0].videoSupportedFrameRateRanges[0].minFrameDuration
    //camera.activeVideoMaxFrameDuration = camera.formats[0].videoSupportedFrameRateRanges[0].maxFrameDuration
    //camera.unlockForConfiguration()
    guard let input = try? AVCaptureDeviceInput(device: camera) else { fatalError("Can't get hold of the camera") }

    session.addInput(input)

    let output = AVCaptureVideoDataOutput()
    output.videoSettings = [:]
    //print(output.videoSettings.description)
    //[875704438, 875704422, 1111970369]
    //output.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String : Int(kCVPixelFormatType_32BGRA)]
    output.setSampleBufferDelegate(self, queue: sampleBufferQueue)

    session.addOutput(output)
    session.commitConfiguration()
  }
}
Andy Jazz
  • 49,178
  • 17
  • 136
  • 220
Mostafa
  • 48
  • 1
  • 8
  • What is the resolution of the video feed? Face detection is a pretty expensive operation and performing it on a large image will cause performance drops. – Frank Rupprecht Mar 05 '20 at 09:39
  • And the cap to 30FPS is probably due to the camera settings. You have to tell the camera to produce more frames per second. – Frank Rupprecht Mar 05 '20 at 09:41
  • for the first question, the sessionpreset is .high (which is 1920x1080), but i also used .medium and .low, same thing, the only thing that makes it faster in when i'm using CIDetector and the accuracy is low, any other tweak i have tried has the same result. – Mostafa Mar 05 '20 at 12:01
  • for the frame rate in camera, i havent set anything, but only on MTKView i have set it to 120 frames, i thought it would be highest by default in camera feed. as i saw your comment i also tried to set the minFrameDuration of camera to highest possible, but still no luck, **camera.activeVideoMinFrameDuration = camera.formats[0].videoSupportedFrameRateRanges[0].maxFrameDuration** i dont know if its the best way to do it or not – Mostafa Mar 05 '20 at 12:05
  • I think you need to set the `activeVideoMaxFrameDuration` to a minimum (duration = 1/FPS). – Frank Rupprecht Mar 05 '20 at 12:13
  • thanks for the comment bro, i have just tried it, still 30FPS [https://drive.google.com/file/d/1U2XbCsR_dmN3d8qCrCK8sv5yKCISSILQ/view?usp=sharing] this is the link to the screenshot – Mostafa Mar 05 '20 at 12:26
  • Could you maybe add your capture setup code? – Frank Rupprecht Mar 05 '20 at 12:30
  • just updated the question, added the capture setup code – Mostafa Mar 05 '20 at 12:39
  • `videoSupportedFrameRateRanges[0].maxFrameDuration`: The first one doesn't need to be the range with the highest FPS, right? Also, you need to use the _min_ frame duration (small duration = high DPS). – Frank Rupprecht Mar 05 '20 at 12:47
  • I suggest you download the sample project from Apple and compare to your setup: https://developer.apple.com/documentation/vision/tracking_the_user_s_face_in_real_time – Frank Rupprecht Mar 05 '20 at 12:49
  • thanks, i already saw it, the only reason the performance there is ok, is using `videoPreviewLayer` which i cannot use, i want to show the filtered frame, not raw capture video output, i should change to `CALayer` and thats not the approach i'm looking for, unless i'm missing something. – Mostafa Mar 05 '20 at 13:21
  • It should really not make much of a difference whether you render into an `MTKView` yourself or let them do the rendering... Can you please try to set the `commandBuffer` of your `CIRenderDestination` to `nil`? – Frank Rupprecht Mar 05 '20 at 13:43
  • i did set the commandBuffer to nil, no difference, what do you mean by let them do the rendering? if i want to show the filter that i'm applying to the frame, on calayer, again when i want to record the video it should be rendered into a pixelBuffer , which is the same thing, isnt it? – Mostafa Mar 05 '20 at 14:04
  • `videoPreviewLayer` is basically their implementation of the render-to-display that you are doing with MTKView. That’s what I meant. – Frank Rupprecht Mar 05 '20 at 14:21
  • aha, got it. is there anyway that i could apply cifilter to preview? because CALayer as far as i know is just adding a filtered layer to the view, not manipulating the image behind it, is that right? – Mostafa Mar 05 '20 at 14:41
  • Sorry, I don't understand how `CALayer` is involved here... You can try to optimize the draw timing of your `MTKView`: Move the code from your `renderImage()` to `draw(in view: MTKView)` (without the `draw()` call at the end), configure the view to be paused (`isPaused = true`) and to react to needsDisplay (`enableSetNeedsDisplay = true`). Then you can call `setNeedsDisplay()` whenever you actually have a new camera frame. – Frank Rupprecht Mar 05 '20 at 20:41
  • But then again, performing face detection _and_ filters on a FullHD camera feed _is_ expensive. What lets you assume that it will run >= 30 FPS? – Frank Rupprecht Mar 05 '20 at 20:45
  • thanks a lot bro, it got a little better that way actually. for the quality, what i thought at least without face detection it would be > 30 or even 60, but it was fixed on 30 in case of not using face detection and after i did what you suggested for the MTKView changes, it went up to 60(without face detection) the interesting thing is that when i use VGA presets its also laggy, around 8FPS with face detection – Mostafa Mar 06 '20 at 11:04

0 Answers0