How are the ARKit People Occlusion samples being done?

Question

This may be an obscure question, but I see lots of very cool samples online of how people are using the new ARKit people occlusion technology in ARKit 3 to effectively "separate" the people from the background, and apply some sort of filtering to the "people" (see here).

In looking at Apple's provided source code and documentation, I see that I can retrieve the segmentationBuffer from an ARFrame, which I've done, like so;

func session(_ session: ARSession, didUpdate frame: ARFrame) {
    let image = frame.capturedImage
    if let segementationBuffer = frame.segmentationBuffer {

        // Get the segmentation's width
        let segmentedWidth = CVPixelBufferGetWidth(segementationBuffer)

        // Create the mask from that pixel buffer.
        let sementationMaskImage = CIImage(cvPixelBuffer: segementationBuffer, options: [:])

        // Smooth edges to create an alpha matte, then upscale it to the RGB resolution.
        let alphaUpscaleFactor = Float(CVPixelBufferGetWidth(image)) / Float(segmentedWidth)
        let alphaMatte = sementationMaskImage.clampedToExtent()
            .applyingFilter("CIGaussianBlur", parameters: ["inputRadius": 2.0)
            .cropped(to: sementationMaskImage.extent)
            .applyingFilter("CIBicubicScaleTransform", parameters: ["inputScale": alphaUpscaleFactor])

        // Unknown...

    }
}

In the "unknown" section, I am trying to determine how I would render my new "blurred" person on top of the original camera feed. There does not seem to be any methods to draw the new CIImage on "top" of the original camera feed, as the ARView has no way of being manually updated.

Andy Jazz · Answer 1 · 2019-07-21T20:17:09.547

5

In the following code snippet we see personSegmentationWithDepth type property for depth compositing (there are RGB, Alpha and Depth channels):

// Automatically segmenting and then compositing foreground (people), 
// middle-ground (3D model) and background.

let session = ARSession()

if let configuration = session.configuration as? ARWorldTrackingConfiguration {
    configuration.frameSemantics.insert(.personSegmentationWithDepth)
    session.run(configuration)
}

You can manually access a Depth Data of World Tracking in CVPixelBuffer (depth values for a performed segmentation):

let image = frame.estimatedDepthData

And you can manually access a Depth Data of Face Tracking in CVPixelBuffer (from TrueDepth camera):

let image = session.currentFrame?.capturedDepthData?.depthDataMap

Also, there's a generateDilatedDepth instance method in ARKit 3.0:

func generateDilatedDepth(from frame: ARFrame, 
                       commandBuffer: MTLCommandBuffer) -> MTLTexture

In your case you have to use estimatedDepthData because Apple documentation says:

It's a buffer that represents the estimated depth values from the camera feed that you use to occlude virtual content.

var estimatedDepthData: CVPixelBuffer? { get }

If you multiply DEPTH data from this buffer (at first you have to convert Depth channel to RGB) by RGB or ALPHA using compositing techniques and you'll get awesome effects.

Look at these 6 images: the lower row represents three RGB-images corrected with Depth channel: depth grading, depth blurring, depth point position pass.

edited Jul 21 '19 at 20:17

answered Jul 20 '19 at 17:46

Andy Jazz

49,178
17
136
220

Thanks for your reply, @ARGeo. This doesn’t explain how I can use the segmentationBuffer/depth data map, however. If I wanted to run the segmented person through a CIFilter, then render that on screen, I’m not sure how I would achieve what the sample video linked in the question is achieving. – ZbadhabitZ Jul 20 '19 at 21:41
1

This is very, very helpful, @ARGeo. I actually was not aware of how to use the estimated depth data, in conjunction with the segmentation buffer, and this does clarify the purpose of the two. With that said, presuming I successfully convert the depth data to RGB, I'm still unclear how I would actually "superimpose" my filtered buffers on top of the view. In an ARSession, the ARView is preconfigured by iOS, and does not seem to have any method to manually update the view; in a sense, I'm getting the buffers between the time the camera seems them and they're rendered; how do I render? – ZbadhabitZ Jul 21 '19 at 20:45
For superimposition use a `CISourceOverCompositing` (it uses a formula – `Argb*Aa + Brgb*(1-Aa)`, A=foreground, B=background). Touching updating a view, I can't say anything because at the moment I can't install iOS13, so I can't test any methods. – Andy Jazz Jul 21 '19 at 20:56
Where `Argb*Aa` is a premultiplied RGBA image, and `(1-Aa)` is inversion of alpha of foreground image (alpha inversion makes a hole in background image, allowing you to place a premultiplied foreground image right into this hole). – Andy Jazz Jul 21 '19 at 21:08
1

Thanks for all of the great detail, @ARGeo. Your explanation does help quite a bit in understanding what the varying channels are doing, and how to properly calculate them. Thank you for that! The `CISourceOverCompositing` is also helpful, and I feel is 50% of the missing piece. Now I have to figure out how to take the composited image and indeed display it on screen, as I still lack the know-ho to bring a composited CIImage (or CVPixelBuffer) onto the display on top of an ARFrame. – ZbadhabitZ Jul 21 '19 at 22:50

score 2 · Answer 2 · answered Jul 23 '19 at 10:31

2

the Bringing People into AR WWDC session has some information, especially about ARMatteGenerator. The session also comes with a sample code.

answered Jul 23 '19 at 10:31

mnuages

13,049
2
23
40

3

Thanks, @mnuages. I did have a look at that session, as well as the sample code you linked to, though I am not quite sure how to put all of this together. The sample code in that project references building your own renderer for the ARSession, requiring Metal to draw the segmented person and camera feed. Is there a way to achieve the same thing without using a custom renderer, just by using the segmentationBuffer and estimatedDepthData, and modifying the ARView? Rather than building my own MTKView? – ZbadhabitZ Jul 23 '19 at 21:33
^ This is exactly my question. Is there any update on this? – Friendly King Aug 20 '19 at 22:25
Also wondering if there's an answer to this question for example, using ARFrame instead as an input to your occlusion/rendering system. – KFDoom May 28 '20 at 07:31

How are the ARKit People Occlusion samples being done?

2 Answers2

Linked