AVCapturePhoto SemanticSegmentationMatte nil without audio input?

Question

When I add audio input to capture session, photoOutput(_ output: AVCapturePhotoOutput, didFinishProcessingPhoto photo: AVCapturePhoto, error: Error?) callback returns semantic segmentation mattes properly. Without audio input, returned mattes are nil. Is it possible to avoid adding audio input and requesting user to give permission for microphone in order to get mattes?

    // MARK: - Session

private func setupSession() {
    captureSession = AVCaptureSession()
    captureSession?.sessionPreset = .photo
    setupInputOutput()
    setupPreviewLayer(view)
    captureSession?.startRunning()
}
// MARK: - Settings

private func setupCamera() {
    
    settings = AVCapturePhotoSettings()
    
    let supportsHEVC = AVAssetExportSession.allExportPresets().contains(AVAssetExportPresetHEVCHighestQuality)

    settings = supportsHEVC ? AVCapturePhotoSettings(format: [AVVideoCodecKey: AVVideoCodecType.hevc]) : AVCapturePhotoSettings(format: [AVVideoCodecKey: AVVideoCodecType.jpeg])
    
    settings!.flashMode = .auto
    settings!.isHighResolutionPhotoEnabled = true
    settings!.previewPhotoFormat = [kCVPixelBufferPixelFormatTypeKey as String: settings!.__availablePreviewPhotoPixelFormatTypes.first ?? NSNumber()]
    settings!.isDepthDataDeliveryEnabled = true
    settings!.isPortraitEffectsMatteDeliveryEnabled = true
    if self.photoOutput?.enabledSemanticSegmentationMatteTypes.isEmpty == false {
        settings!.enabledSemanticSegmentationMatteTypes = self.photoOutput?.enabledSemanticSegmentationMatteTypes ?? [AVSemanticSegmentationMatte.MatteType]()
    }

    settings!.photoQualityPrioritization = self.photoQualityPrioritizationMode
}

private func setupInputOutput() {
    photoOutput = AVCapturePhotoOutput()
    
    guard let captureSession = captureSession  else { return }
    guard let photoOutput = photoOutput else { return }
    
    do {
        captureSession.beginConfiguration()
        captureSession.sessionPreset = .photo
        let devices = self.videoDeviceDiscoverySession.devices
        currentDevice = devices.first(where: { $0.position == .front && $0.deviceType == .builtInTrueDepthCamera })

        guard let videoDevice = currentDevice else {
            captureSession.commitConfiguration()
            return
        }
        
        videoDeviceInput = try AVCaptureDeviceInput(device: videoDevice)

        if captureSession.canAddInput(videoDeviceInput) {
            captureSession.addInput(videoDeviceInput)
        } else {
            captureSession.commitConfiguration()
            return
        }
        
        currentDevice = AVCaptureDevice.default(for: .audio)
        captureDeviceInput = try AVCaptureDeviceInput(device: currentDevice!)

        if captureSession.canAddInput(captureDeviceInput) {
            captureSession.addInput(captureDeviceInput)
        } else {
            captureSession.commitConfiguration()
            return
        }
    } catch {
        errorMessage = error.localizedDescription
        print(error.localizedDescription)
        captureSession.commitConfiguration()
        return
    }

    if captureSession.canAddOutput(photoOutput) {
        captureSession.addOutput(photoOutput)

        photoOutput.isHighResolutionCaptureEnabled = true
        photoOutput.isLivePhotoCaptureEnabled = photoOutput.isLivePhotoCaptureSupported
        photoOutput.isDepthDataDeliveryEnabled = photoOutput.isDepthDataDeliverySupported
        photoOutput.isPortraitEffectsMatteDeliveryEnabled = photoOutput.isPortraitEffectsMatteDeliverySupported
        photoOutput.enabledSemanticSegmentationMatteTypes = photoOutput.availableSemanticSegmentationMatteTypes
      
        photoOutput.maxPhotoQualityPrioritization = .balanced
    }
    captureSession.commitConfiguration()
}

private func setupPreviewLayer(_ view: UIView) {
    self.cameraPreviewLayer = AVCaptureVideoPreviewLayer(session: captureSession ?? AVCaptureSession())
    self.cameraPreviewLayer?.videoGravity = AVLayerVideoGravity.resizeAspectFill
    self.cameraPreviewLayer?.connection?.videoOrientation = AVCaptureVideoOrientation.portrait
    self.cameraPreviewLayer?.frame = view.frame
    view.layer.insertSublayer(self.cameraPreviewLayer ?? AVCaptureVideoPreviewLayer(), at: 0)
}

Yes, it definitely should be. I can only imagine that the session changes internal settings when you add an audio output in the assumption you want to record video. Can you maybe share your session setup code? — Frank Rupprecht, Jul 21 '20 at 05:51
Thx for reply. I've added setup code. This code works but when I comment out the lines for adding AVCaptureDeviceInput for audio, callback gives nil for segmentation mattes.Please, note, that I'm first calling setupSession() and then setupCamera(). — Jovan Stankovic, Jul 21 '20 at 08:00
My guess is that adding the audio input will change the `sessionPreset` to some video format, which will change the delivery settings for the matte (since video doesn't support the segmentation mattes). I think you need two different configurations for the different use cases (portrait photo vs. video recording). — Frank Rupprecht, Jul 21 '20 at 08:31

score 1 · Answer 1 · answered Jun 10 '21 at 23:46

I was not able to return semantic segmentation mattes (SSM) at all with/without setting up audio input. I am currently developing on an iPhone X. After struggling for some time, I asked Apple the question on a 1-1 lab session during WWDC2021. I was told that the API would only make portrait effect matte visible to my device. iPhone 11 and above would be able to get skin, teeth and hair. The new glasses ssm that they snuck in recently without announcing requires iPhone 12.

AVCapturePhoto SemanticSegmentationMatte nil without audio input?

1 Answers1