6

I want to record a series of clips that when played together via a video player or ffmpeg -f concat play back seemlessly.

In either scenario right now, I'm getting a very noticeable audio hiccup at each segment join point.

My current strategy is to maintain 2 AssetWriter instances. At each cut off point, I start up a new writer, wait until it's ready, then start giving it samples. When video and audio samples are done at a specific point in time, I close the the last writer.

How do I modify this to get continuous clip recording? What's the root cause issue?

import Foundation
import UIKit
import AVFoundation

class StreamController: UIViewController, AVCaptureAudioDataOutputSampleBufferDelegate, AVCaptureVideoDataOutputSampleBufferDelegate {
    @IBOutlet weak var previewView: UIView!

    var closingVideoInput: AVAssetWriterInput?
    var closingAudioInput: AVAssetWriterInput?
    var closingAssetWriter: AVAssetWriter?

    var currentVideoInput: AVAssetWriterInput?
    var currentAudioInput: AVAssetWriterInput?
    var currentAssetWriter: AVAssetWriter?

    var nextVideoInput: AVAssetWriterInput?
    var nextAudioInput: AVAssetWriterInput?
    var nextAssetWriter: AVAssetWriter?

    var previewLayer: AVCaptureVideoPreviewLayer?
    var videoHelper: VideoHelper?

    var startTime: NSTimeInterval = 0
    override func viewDidLoad() {
        super.viewDidLoad()
        startTime = NSDate().timeIntervalSince1970
        createSegmentWriter()
        videoHelper = VideoHelper()
        videoHelper!.delegate = self
        videoHelper!.startSession()
        NSTimer.scheduledTimerWithTimeInterval(5, target: self, selector: "createSegmentWriter", userInfo: nil, repeats: true)
    }

    func createSegmentWriter() {
        print("Creating segment writer at t=\(NSDate().timeIntervalSince1970 - self.startTime)")
        nextAssetWriter = try! AVAssetWriter(URL: NSURL(fileURLWithPath: OutputFileNameHelper.instance.pathForOutput()), fileType: AVFileTypeMPEG4)
        nextAssetWriter!.shouldOptimizeForNetworkUse = true

        let videoSettings: [String:AnyObject] = [AVVideoCodecKey: AVVideoCodecH264, AVVideoWidthKey: 960, AVVideoHeightKey: 540]
        nextVideoInput = AVAssetWriterInput(mediaType: AVMediaTypeVideo, outputSettings: videoSettings)
        nextVideoInput!.expectsMediaDataInRealTime = true
        nextAssetWriter?.addInput(nextVideoInput!)

        let audioSettings: [String:AnyObject] = [
                AVFormatIDKey: NSNumber(unsignedInt: kAudioFormatMPEG4AAC),
                AVSampleRateKey: 44100.0,
                AVNumberOfChannelsKey: 2,
        ]
        nextAudioInput = AVAssetWriterInput(mediaType: AVMediaTypeAudio, outputSettings: audioSettings)
        nextAudioInput!.expectsMediaDataInRealTime = true
        nextAssetWriter?.addInput(nextAudioInput!)

        nextAssetWriter!.startWriting()
    }

    override func viewDidAppear(animated: Bool) {
        super.viewDidAppear(animated)
        previewLayer = AVCaptureVideoPreviewLayer(session: videoHelper!.captureSession)
        previewLayer!.frame = self.previewView.bounds
        previewLayer!.videoGravity = AVLayerVideoGravityResizeAspectFill
        if ((previewLayer?.connection?.supportsVideoOrientation) != nil) {
            previewLayer?.connection?.videoOrientation = AVCaptureVideoOrientation.LandscapeRight
        }
        self.previewView.layer.addSublayer(previewLayer!)
    }

    func closeWriter() {
        if videoFinished && audioFinished {
            let outputFile = closingAssetWriter?.outputURL.pathComponents?.last
            closingAssetWriter?.finishWritingWithCompletionHandler() {
                let delta = NSDate().timeIntervalSince1970 - self.startTime
                print("segment \(outputFile) finished at t=\(delta)")
            }
            self.closingAudioInput = nil
            self.closingVideoInput = nil
            self.closingAssetWriter = nil
            audioFinished = false
            videoFinished = false
        }
    }

    func closingVideoFinished() {
        if closingVideoInput != nil {
            videoFinished = true
            closeWriter()
        }
    }

    func closingAudioFinished() {
        if closingAudioInput != nil {
            audioFinished = true
            closeWriter()
        }
    }

    var closingTime: CMTime = kCMTimeZero
    var audioFinished = false
    var videoFinished = false
    func captureOutput(captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBufferRef, fromConnection connection: AVCaptureConnection!) {
        let sampleTime: CMTime = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
        if let nextWriter = nextAssetWriter {
            if nextWriter.status.rawValue != 0 {
                print("Switching asset writers at t=\(NSDate().timeIntervalSince1970 - self.startTime)")

                closingAssetWriter = currentAssetWriter
                closingVideoInput = currentVideoInput
                closingAudioInput = currentAudioInput

                currentAssetWriter = nextAssetWriter
                currentVideoInput = nextVideoInput
                currentAudioInput = nextAudioInput

                nextAssetWriter = nil
                nextVideoInput = nil
                nextAudioInput = nil

                closingTime = sampleTime
                currentAssetWriter!.startSessionAtSourceTime(sampleTime)
            }
        }

        if currentAssetWriter != nil {
            if let _ = captureOutput as? AVCaptureVideoDataOutput {
                if (CMTimeCompare(sampleTime, closingTime) < 0) {
                    if closingVideoInput?.readyForMoreMediaData == true {
                        closingVideoInput?.appendSampleBuffer(sampleBuffer)
                    }
                } else {
                    closingVideoFinished()
                    if currentVideoInput?.readyForMoreMediaData == true {
                        currentVideoInput?.appendSampleBuffer(sampleBuffer)
                    }
                }

            } else if let _ = captureOutput as? AVCaptureAudioDataOutput {
                if (CMTimeCompare(sampleTime, closingTime) < 0) {
                    if currentAudioInput?.readyForMoreMediaData == true {
                        currentAudioInput?.appendSampleBuffer(sampleBuffer)
                    }
                } else {
                    closingAudioFinished()
                    if currentAudioInput?.readyForMoreMediaData == true {
                        currentAudioInput?.appendSampleBuffer(sampleBuffer)
                    }
                }
            }
        }
    }

    override func shouldAutorotate() -> Bool {
        return true
    }

    override func supportedInterfaceOrientations() -> UIInterfaceOrientationMask {
        return [UIInterfaceOrientationMask.LandscapeRight]
    }
}
Stefan Kendall
  • 66,414
  • 68
  • 253
  • 406

1 Answers1

1

I think that the root cause is due to the video and audio CMSampleBuffers representing different time intervals. You need to split and join the audio CMSampleBuffers to make them slot seamlessly into your AVAssetWriter's timeline, which should probably be based on the video presentation timestamps.

Why should audio have to change and not video? It seems asymmetric, but I guess it's because audio has the higher sampling rate.

p.s. actually creating the new split sample buffers looks intimidating. CMSampleBufferCreate has a tonne of arguments. CMSampleBufferCopySampleBufferForRange might be easier and more efficient to use.

Rhythmic Fistman
  • 34,352
  • 5
  • 87
  • 159
  • Might it be possible to use CMSampleBufferCopySampleBufferForRange to cut the piece of the sample I need on one side, and also get the remainder for the next AVAssetWriter? I wonder if a given sample itself may jump the time border, but properly partitioning samples may be enough. It appears that a sample buffer will usually (always?) contain a single frame of video, but audio gets many samples, hence the asymmetry. – Stefan Kendall Nov 21 '15 at 04:03
  • `CMSampleBufferCopySampleBufferForRange` looks promising and video is always 1 frame. You're right, there's no reason why a sample _should_ fall on frame boundaries. But maybe AVFoundation will cut for you - an overhanging audio buffer might "just" work if you append it in both current & next writers. Or maybe you should split your files at audio buffer boundaries or audio sample boundaries, repeating/cutting video frames as necessary. This would give you constant file durations thanks to audio's constant sampling rate. – Rhythmic Fistman Nov 21 '15 at 10:51
  • 1
    splitting at an audio boundary is probably simpler. I'll try that. Constant file segment length is actually required +- seconds, but it'll be a nice feature to have regardless. – Stefan Kendall Nov 21 '15 at 15:19
  • Sounds like you're doing something interesting. The file chunks remind me of an instant replay app I was working on a long time ago. The hardware wasn't quite up to simultaneous encoding/decoding at the time. Probably a different story now. – Rhythmic Fistman Nov 22 '15 at 23:19
  • Any luck on the splitting buffers? i'am facing a similar issue http://stackoverflow.com/questions/43322157/split-cmsamplebufferref-containing-audio – Peter Lapisu Apr 10 '17 at 11:57
  • You can create two new buffers with `CMSampleBufferCreate` – Rhythmic Fistman Apr 10 '17 at 12:00