1

I am working on an app that receives a stream of packets with video and audio. I was able to decode the video and play it using AVSampleBufferDisplayLayer. (Code can be found here) But I've been struggling for over a week to decode the audio. At the beginning of the stream, I receive an audio description like this -> sampleRate=48000, channelCount=1, and profileLevel=2.Then I continuously receive packets of ACC data which I try to decode in a similar way as I treat the video packets. In the beginning, I create AudioStreamBasicDescription, CMAudioFormatDescription, and setup the audioRenderer and audioRendererSynchoronizer.

   class AudioDecoderPlayer: NSObject {
  private var audioRenderer = AVSampleBufferAudioRenderer()
  private var audioRendererSynchoronizer = AVSampleBufferRenderSynchronizer()
  private let serializationQueue = DispatchQueue(label: "sample.buffer.player.serialization.queue")

  private var audioStreamBasicDescription: AudioStreamBasicDescription
  private var formatDescription: CMAudioFormatDescription?

  private var outputQueue: AudioQueueRef?

  var sampleBuffers: [CMSampleBuffer] = []
  
  let sampleRate: Double
  let channels: Int

  init(sampleRate: Double = 48000, channels: Int = 1, profileLevel: Int = 2) {

    self.sampleRate = sampleRate
    self.channels = channels
    
    let uChannels = UInt32(channels)
    let channelBytes = UInt32(MemoryLayout<Int16>.size)
    let bytesPerFrame = uChannels * channelBytes


    self.audioStreamBasicDescription = AudioStreamBasicDescription(
      mSampleRate: Float64(sampleRate),
      mFormatID: kAudioFormatMPEG4AAC,
      mFormatFlags: AudioFormatFlags(profileLevel),
      mBytesPerPacket: bytesPerFrame,
      mFramesPerPacket: 1,
      mBytesPerFrame: bytesPerFrame,
      mChannelsPerFrame: uChannels,
      mBitsPerChannel: channelBytes * 8,
      mReserved: 0
    )

    super.init()
    
    let status = CMAudioFormatDescriptionCreate(allocator: kCFAllocatorDefault, asbd: &audioStreamBasicDescription, layoutSize: 0, layout: nil, magicCookieSize: 0, magicCookie: nil, extensions: nil, formatDescriptionOut: &formatDescription)

    if status != noErr {
      fatalError("unable to create audio format description")
    }
    
    
    audioRendererSynchoronizer.addRenderer(audioRenderer)
    subscribeToAudioRenderer()
    startPlayback()
  }
  
  func subscribeToAudioRenderer() {
    audioRenderer.requestMediaDataWhenReady(on: serializationQueue, using: { [weak self] in
      guard let strongSelf = self else {
        return
      }
      
      while strongSelf.audioRenderer.isReadyForMoreMediaData {
        if let sampleBuffer = strongSelf.nextSampleBuffer() {
          strongSelf.audioRenderer.enqueue(sampleBuffer)
        }
      }
    })
  }
  
  func startPlayback() {
    serializationQueue.async {
      if self.audioRendererSynchoronizer.rate != 1 {
        self.audioRendererSynchoronizer.rate = 1
        self.audioRenderer.volume = 1.0
      }
    }
  }
  
  func nextSampleBuffer() -> CMSampleBuffer? {
    guard sampleBuffers.count > 0 else {
      return nil
    }
    
    let sampleBuffer = sampleBuffers.first
    sampleBuffers.remove(at: 0)
    
    return sampleBuffer
  }

And this is how my decode function looks like

  func decodeAudioPacket(data: Data) {
    let headerValue = UInt32(data.count)

    //add data lenght at the beginning
    var sizedData = withUnsafeBytes(of: headerValue.bigEndian) { Data($0) }
    sizedData.append(data)

    let blockBuffer = sizedData.toCMBlockBuffer()

    // Outputs from CMSampleBufferCreate
    var sampleBuffer: CMSampleBuffer?

    let result = CMAudioSampleBufferCreateReadyWithPacketDescriptions(
      allocator: kCFAllocatorDefault,
      dataBuffer: blockBuffer,
      formatDescription: formatDescription!,
      sampleCount: 1,
      presentationTimeStamp: CMTime(value: 1, timescale: Int32(sampleRate)),
      packetDescriptions: nil,
      sampleBufferOut: &sampleBuffer)


    if result != noErr {
      fatalError("CMSampleBufferCreate() failed")
    }

    sampleBuffers.append(sampleBuffer!)
  }

I found out that requestMediaDataWhenReady is only called once which suggests, that the sound is not being played, but I don't understand why. I was able to make it work by editing code from WWDC 2017 session 509 but there is still no sound being played. (Code can be found here) I've also tried to use various solutions with AudioQueue but with no success. (For some reasons the callbacks from AudioFileStreamOpen were never called) (Code can be found here and here). But I would prefer to solve it using AVSampleBufferAudioRenderer as I believe it should be easier and I also want to use AVSampleBufferAudioRendererSynchronizer to synchronize the video with the audio.

Any suggestions on what I do wrong is appreciated.

Thank you

Ender
  • 13
  • 4
  • The basic stream description looks wrong and you don’t seem to be providing packet descriptions. Do you have a runnable code snippet? – Rhythmic Fistman Dec 19 '20 at 06:57
  • I was unable to play the audio using AVSampleBufferAudioRenderer. I had to use AudioQueue which required more work but gets the job done. – Ender Dec 22 '20 at 16:10
  • Hello @Ender, I am also trying the same thing as you were (Playing the audio packets from socket). But I am using AVAudioEngine and I am having a glitch between the audio packets. Why did you prefer AudioQueue over AVAudioEngine? I dont see proper example for using AudioQueue. Can you show me the sample code of implementing the AudioQueue? My Result is similar to this: https://storage.googleapis.com/webfundamentals-assets/videos/gap.webm – sudayn Jul 02 '21 at 11:48
  • Hi, @sudayn these glitches also occurred to me. The problem was the stream was not continuous e.g. I restarted the AudioQueue or removed a lot of packets to sync it with audio. To respond to your second question AudioQueue vs. AVAudioEngine - To be honest I do not remember why chose AudioQueue. I wanted to use the AVSampleBufferAudioRenderer and then I switched to AudioQueue. – Ender Jul 07 '21 at 09:28

0 Answers0