4

I'm receiving a stream of 16 bit / 48 kHz stereo PCM samples as Int16s and I'm trying to play them using AVAudioEngine, however I am not hearing anything at all. I'm thinking it either has something to do with the way I set up the player or maybe the way I'm pushing the data into the buffer.

I have read a lot about alternative solutions using Audio Queue Services, however all sample code I could find is either in Objective-C or iOS-only.

If I had any kind of frameSize issues or whatever, shouldn't I still be able to a least hear garbage coming out of my speakers?

This is my code:


import Foundation
import AVFoundation

class VoicePlayer {
    
    var engine: AVAudioEngine
    
    let format = AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatInt16, sampleRate: 48000.0, channels: 2, interleaved: true)!
    let playerNode: AVAudioPlayerNode!
    var audioSession: AVCaptureSession = AVCaptureSession()
    
    init() {
        
        self.audioSession = AVCaptureSession()
        
        self.engine = AVAudioEngine()
        self.playerNode = AVAudioPlayerNode()
        
        self.engine.attach(self.playerNode)
        //engine.connect(self.playerNode, to: engine.mainMixerNode, format:AVAudioFormat.init(standardFormatWithSampleRate: 48000, channels: 2))
        /* If I set my custom format here, AVFoundation complains about the format not being available */
        engine.connect(self.playerNode, to: engine.outputNode, format:AVAudioFormat.init(standardFormatWithSampleRate: 48000, channels: 2))
        engine.prepare()
        try! engine.start()
        self.playerNode.play()
        
    }
    
    
    
    
    func play(buffer: [Int16]) {
        let interleavedChannelCount = 2
        let frameLength = buffer.count / interleavedChannelCount
        let audioBuffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: AVAudioFrameCount(frameLength))!
        print("audio buffer size in frames is \(AVAudioFrameCount(frameLength))")
        // buffer contains 2 channel interleaved data
        // audioBuffer contains 2 channel interleaved data
        var buf = buffer
        let size = MemoryLayout<Int16>.stride * interleavedChannelCount * frameLength
        
        
        memcpy(audioBuffer.mutableAudioBufferList.pointee.mBuffers.mData, &buf, size)
        audioBuffer.frameLength = AVAudioFrameCount(frameLength)
        
        /* Implemented an AVAudioConverter for testing
         Input: 16 bit PCM 48kHz stereo interleaved
         Output: whatever the standard format for the system is
         
         Maybe this is somehow needed as my audio interface doesn't directly support 16 bit audio and can only run at 24 bit?
         */
         let normalBuffer = AVAudioPCMBuffer(pcmFormat: AVAudioFormat.init(standardFormatWithSampleRate: 48000, channels: 2)!, frameCapacity: AVAudioFrameCount(frameLength))
         normalBuffer?.frameLength = AVAudioFrameCount(frameLength)
         let converter = AVAudioConverter(from: format, to: AVAudioFormat.init(standardFormatWithSampleRate: 48000, channels: 2)!)
         var gotData = false
         
         let inputBlock: AVAudioConverterInputBlock = { inNumPackets, outStatus in
         
         if gotData {
         outStatus.pointee = .noDataNow
         return nil
         }
         gotData = true
         outStatus.pointee = .haveData
         return audioBuffer
         }
         
         var error: NSError? = nil
         let status: AVAudioConverterOutputStatus = converter!.convert(to: normalBuffer!, error: &error, withInputFrom: inputBlock);
         
        // Play the output buffer, in this case the audioBuffer, otherwise the normalBuffer
        // Playing the raw audio buffer causes an EXEC_BAD_ACCESS on playback, playing back the buffer from the converter doesn't, but it still doesn't sound anything like a human voice
        self.playerNode.scheduleBuffer(audioBuffer) {
        print("Played")
        }
        
        
    }
    
    
}

Any help would be greatly appreciated.

Tobias Timpe
  • 720
  • 2
  • 13
  • 27

2 Answers2

2

Once you copy data into an AVAudioPCMBuffer you need to set its frameLength property to indicate how much valid audio it contains.

func play(buffer: [Int16]) {
    let interleavedChannelCount = 2
    let frameLength = buffer.count / interleavedChannelCount
    let audioBuffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: AVAudioFrameCount(frameLength))!

    // buffer contains 2 channel interleaved data
    // audioBuffer contains 2 channel interleaved data

    var buf = buffer
    memcpy(audioBuffer.mutableAudioBufferList.pointee.mBuffers.mData, &buf, MemoryLayout<Int16>.stride * interleavedChannelCount * frameLength)

    audioBuffer.frameLength = AVAudioFrameCount(frameLength)

    self.playerNode.scheduleBuffer(audioBuffer) {
        print("Played")
    }
}

Edit: Updated for changes to the question. The old, (now) irrelevant portion:

Part of the problem is that there is an inconsistency in your formats. format is declared as non-interleaved but buffer is a single array of Int16 so presumably represents interleaved data. Copying one to another directly is probably not correct.

sbooth
  • 16,646
  • 2
  • 55
  • 81
  • Oh, I am so sorry, I actually had the interleaved part completely wrong. Everything is interlaced in my audio chain. Your solution brought this up. I tried adjusting it with no luck. I updated the code above. – Tobias Timpe Dec 26 '20 at 20:50
  • That makes quite a difference. I've updated the answer. – sbooth Dec 27 '20 at 14:12
  • Thank you, this seems a lot easier. However, I am now getting an EXEC_BAD_ACCESS when doing the memcpy call. These are my sizes: size: 7680 buffer is 3840 values frameLength: 1920 AVAudioFrameCount of frameLength is 1920. Is my buffer that I'm trying to play too long or is there something wrong with the size calculation? – Tobias Timpe Dec 28 '20 at 15:25
  • Those numbers seem to make sense for 16-bit stereo audio; 2 bytes per sample * 2 samples per frame = 4 bytes per frame, so 1920 frames for 7680 bytes seems right. Please post a complete (non)working example to help narrow down the problem. – sbooth Dec 28 '20 at 19:41
  • Thanks i am able to add audio engine using this, to my audio stream @sbooth – Arpit B Parekh Jun 02 '23 at 14:58
  • @sbooth i am getting crackling noise as an output – Arpit B Parekh Jun 05 '23 at 13:58
  • 1
    @ArpitBParekh That could be caused by incorrect frame lengths but I'd recommend asking a new question with an example showing the problem – sbooth Jun 06 '23 at 13:01
  • @sbooth here is a quesion https://stackoverflow.com/questions/76415463/playing-raw-audio-data-stream-coming-from-ble – Arpit B Parekh Jun 09 '23 at 04:54
2

I had a very similar problem and I managed to solve it using this post as a starting point. The differences: I'm using float instead of int data. I have some C code that receives the audio data from the network and writes it to the head of a circular buffer (I use TPCircularBuffer). My Swift code then reads the data from the tail of that same circular buffer and copies it to an AVMAudioPCMBuffer, which I then pass to playernode.schedulebuffer(). I use a function that grabs the next data in this way, calls schedulebuffer and passes itself as the schedulebuffer callback, so data keeps being scheduled continuously. I got the best result using two such scheduling threads.

Otherwise the situation is the same. My input data is interleaved, as in your case. It seems to me that interleaved audio is just not supported by AVAudioEngine. If you specify interleaved stereo as the input format in engine.connect() you'll get a crash pointing to setFormat(). If instead you specify a non-interleaved format, as you did above, you are basically misinforming the framework about the data it's supposed to process and you will get EXEC_BAD_ACCESS errors.

To fix this I first tried to copy the data from the circular buffer to an interleaved AVAudioPCMBuffer. Then I used AVAudioConverter to produce a new deinterleaved AVAudioPCMBuffer which I then scheduled for playback. Now I finally got audio. At first it was only a series of random distorted click noises, possibly similar to what you heard after conversion. In my case I tracked the problem down to a simple miscalculation: I had been dividing a sample count by numChannels and MemoryLayout<Float>.stride to calculate the byte count rather than multiplying the three. So I heard only a 64th of each packet. After that was fixed I heard actual speech, but the code was too slow to keep up with realtime data arrival. I ended up copying and converting the data in one operation, throwing out AVAudioConverter and doing it manually instead. Check out my code below.

Two more points regarding your code:

  1. I find accessing the data in AVAudioPCMBuffer through buffer.floatChannelData (or in your case buffer.int16ChannelData) much more convenient than using buffer.mutableAudioBufferList.pointee.mBuffers.mData.
  2. I'm not sure why you used AVCaptureSession for your audio session. Maybe you had a sound reason but I personally use AVAudioSession.sharedInstance() and audioSession.setCategory(.playback, mode: .spokenAudio, policy: .longForm) for playing back speech and it works nicely.

Here's the relevant portion of working code:

    init() {
        do {
            if #available(iOS 11.0, *) {
                try audioSession.setCategory(.playback, mode: .spokenAudio, policy: .longForm)
            } else {
                try audioSession.setCategory(.playback, mode: .spokenAudio)
            }
        } catch {
            print("Failed to set audio session category. Error: \(error)")
        }
        engine = AVAudioEngine()
        playerNode = AVAudioPlayerNode()
        outputFormat = AVAudioFormat(standardFormatWithSampleRate: sampleRate, channels: numChannels)!
        circularBuffer = TPCircularBuffer()
        
        engine.attach(playerNode)
        engine.connect(playerNode, to: engine.outputNode, format: outputFormat)
        engine.prepare()
    }

    func start() {
        isPlayRequested = true
        _TPCircularBufferInit(&circularBuffer, bufferLength, MemoryLayout<TPCircularBuffer>.stride)
        networkStream_start(&circularBuffer) //this starts a loop in C code
        do {
            try audioSession.setActive(true)
        } catch {
            print("Failed to start audio session. Error: \(error)")
        }
        do {
            try engine.start()
        } catch {
            print("Failed to start audio engine. Error: \(error)")
        }

        for _ in 1...numSchedulers {
            scheduleNextData()
        }
        playerNode.play()
    }
    
    func getAndDeinterleaveNextData () -> AVAudioPCMBuffer {
        let inputBufferTail = TPCircularBufferTail(&circularBuffer, &availableBytes)
        let outputBuffer = AVAudioPCMBuffer(pcmFormat: outputFormat, frameCapacity: bufferLength)!
        if inputBufferTail != nil {
            let sampleCount = Int(availableBytes / numSchedulers / floatSize)
            let tailFloatPointer = inputBufferTail!.bindMemory(to: Float.self, capacity: sampleCount)
            for channel in 0..<Int(numChannels) {
                for sampleIndex in 0..<sampleCount {
                    outputBuffer.floatChannelData![channel][sampleIndex] = tailFloatPointer[sampleIndex * Int(numChannels) + channel]
                }
            }
            outputBuffer.frameLength = AVAudioFrameCount(sampleCount / Int(numChannels))
            TPCircularBufferConsume(&circularBuffer, outputBuffer.frameLength * numChannels * floatSize)
        }
        return outputBuffer
    }
    
    func scheduleNextData() {
        if isPlayRequested {
            let outputBuffer = getAndDeinterleaveNextData()
            playerNode.scheduleBuffer(outputBuffer, completionHandler: scheduleNextData)
        }
    }
jilipop
  • 53
  • 5