I recently started programming in Swift as I am trying to work out an iOS camera app idea I've had. The main goal of the project is to save the prior 10 seconds of video before the record button is tapped. So the app is actually always capturing and storing frames, but also discarding the frames that are more than 10 seconds old if the app is not 'recording'.
My approach is to output video and audio data from the AVCaptureSession using respectively AVCaptureVideoDataOutput()
and AVCaptureAudioDataOutput()
. Using captureOutput()
I receive a CMSampleBuffer for both video and audio, who I store in different arrays. I would like those arrays to later serve as an input for the AVAssetWriter.
This is the point where I'm not sure about the role of time and timing regarding the sample buffers and the capture session in general, because in order to present the sample buffers to the AVAssetWriter as an input (I believe) I need to make sure my video and audio data are the same length (duration wise) and synchronized.
I currently need to figure out at what rate the capture session is running, or how I can set that rate. Ideally I would have one audioSampleBuffer for each videoSampleBuffer, representing both the exact same duration. I don't know what realistic values are, but in the end my goal is to output 60fps, so it would be perfect if the videoSampleBuffer would contain 1 frame and the audioSampleBuffer would represent 1/60th of a second. I then could easily append the newest sample buffers to the arrays and drop the oldest.
I've of course done some research regarding my problem, but wasn't able to find what I was looking for.
My initial thought was I had to let the capture session run at some sort of set timescale, but didn't see such an option in the AVFoundation documentation. I then looked into Core Media if there was some way to set the clock the capture session was using, but couldn't find a way to say to the session to use a different CMClock (with properties I know), so I gave up this route. I still wasn't sure about the internal mechanics and timing of the capture session, so I tried to find more information about it, but without much luck. I've also stumbled on the synchronizationClock property of AVCaptureSession, but I couldn't find out how to implement this or find an example.
To this point my best guess is that with every step in time (represented by a timestamp) a new sample buffer for both video and audio is created. Which would be a good thing. But I've a feeling this is just wishful thinking and then would still not know what duration the buffers would represent.
Could anyone help me in the right direction, helping me to understand how time works in a capture session and how to get or set the duration of sample buffers?