How to apply Vision Framework to Video Playback

Question

I only found examples mainly using the live camera capture to apply the vision framework, which I already have working. I also want to apply the body pose detection and drawing upon video playback. I have the following code which already plays back a video stored on my device.

            let videoURL = URL(fileURLWithPath: NSString.path(withComponents: [documentsDirectory, path]) as String)
            let player = AVPlayer(url: videoURL)
            let vc = AVPlayerViewController()
            vc.player = player
            
            present(vc, animated: true) {
                vc.player?.play()
            }

How can I send a modified version of the video to the player which uses something like this to first detect persons in the video using the Vision Framework:

        let visionRequestHandler = VNImageRequestHandler(cgImage: frame)

        // Use Vision to find human body poses in the frame.
        do { try visionRequestHandler.perform([humanBodyPoseRequest]) } catch {
            assertionFailure("Human Pose Request failed: \(error)")
        }

        let poses = Pose.fromObservations(humanBodyPoseRequest.results)

on each frame of the video and then draw each pose onto the respective video frame before sending it to the AVPlayer

pose.drawWireframeToContext(cgContext, applying: pointTransform)

I don't know how to do it with AVPlayer, but this Apple's demo app shows how to output a pre-recorded video into a camera view, and then follow (almost) normal Vision detection: https://developer.apple.com/documentation/vision/detecting_moving_objects_in_a_video — timbre timbre, Jun 01 '23 at 16:32
Thanks. I saw that example. And there is a similar one here https://developer.apple.com/documentation/vision/building_a_feature-rich_app_for_sports_analysis But the main problem I have is that you don't have video controls with those examples. I will try to implement a solution similar to this and see if I can create a AVAsset https://img.ly/blog/add-text-to-video-in-swift/ — Philipp Dobrigkeit, Jun 05 '23 at 07:52
And didd you consider using SKVideoNode? Not sure if it helps, but seems like your use case is closer to having AR overlay which is what it supports.. — timbre timbre, Jun 05 '23 at 14:04

score 0 · Answer 1 · answered Jun 09 '23 at 10:26

            let videoAsset = AVAsset(url: videoURL)
            let poseComposition = AVMutableVideoComposition(asset: videoAsset) {request in
                let poses = self.findPosesInFrame(request.sourceImage)
                let poseImage = self.drawPoses(poses, onto: request.sourceImage)
                request.finish(with: poseImage, context: nil)
            }
            
            let videoItem = AVPlayerItem(asset: videoAsset)
            videoItem.videoComposition = poseComposition
            
            return AVPlayer(playerItem: videoItem)

I'll leave this here for other people to find. But the AVMutableVideoComposition with the already existing code and slight transformations to make it work with CIImage from the Detecting Body Poses example did the trick. Thanks for the comments.

How to apply Vision Framework to Video Playback

1 Answers1