17

I've found lots of examples online for working with audio in iOS, but most of them are pretty outdated and don't apply to what I'm trying to accomplish. Here's my project:

I need to capture audio samples from two sources - microphone input and stored audio files. I need to perform FFT on these samples to produce a "fingerprint" for the entire clip, as well as apply some additional filters. The ultimate goal is to build a sort of song-recognition software similar to Shazam, etc.

What is the best way to capture the individual audio samples in iOS 8 for performing a Fast Fourier Transform? I imagine ending up with a large array of them, but I suspect that it might not work quite like that. Secondly, how can I use the Accelerate framework for processing the audio? It seems to be the most efficient way to perform complex analysis on audio in iOS.

All the examples I've seen online are using older versions of iOS and Objective-C, and I haven't been able to successfully translate them into Swift. Does iOS 8 provide some new frameworks for this sort of thing?

Hundley
  • 3,167
  • 3
  • 23
  • 45
  • You could start by looking at Apple's own examples. They might be in Objective-C, but the APIs have not changed. All of the vDSP_xx functions have a C API in any case, and realistically, the analysis part of your project will probably want to be written in C or C++ (this is, incidentally, the advice from Apple engineers at this year's WWDC for writing audio processing/render handlers). As for audio fingerprinting, this is a non-trivial problem and too board for SO. – marko Jun 20 '15 at 19:04
  • did you find anything ? – hoangpx Jan 11 '17 at 20:38

2 Answers2

16

AVAudioEngine is the way to go for this. From Apple's docs:

  • For playback and recording of a single track, use AVAudioPlayer and AVAudioRecorder.
  • For more complex audio processing, use AVAudioEngine. AVAudioEngine includes AVAudioInputNode and AVAudioOutputNode for audio input and output. You can also use AVAudioNode objects for processing and mixing effects into your audio

I'll be straight with you: AVAudioEngine is an extremely finicky API with vague documentation, rarely-helpful error messaging, and almost no online code examples demonstrating more than the most basic tasks. BUT if you take the time to get over the small learning curve, you can really do some magical things with it relatively easily.

I've built a simple "playground" view controller that demonstrates both microphone and audio file sampling working in tandem:

import UIKit

class AudioEnginePlaygroundViewController: UIViewController {
    private var audioEngine: AVAudioEngine!
    private var mic: AVAudioInputNode!
    private var micTapped = false
    override func viewDidLoad() {
        super.viewDidLoad()
        configureAudioSession()
        audioEngine = AVAudioEngine()
        mic = audioEngine.inputNode!
    }

    static func getController() -> AudioEnginePlaygroundViewController {
        let me = AudioEnginePlaygroundViewController(nibName: "AudioEnginePlaygroundViewController", bundle: nil)
        return me
    }

    @IBAction func toggleMicTap(_ sender: Any) {
        if micTapped {
            mic.removeTap(onBus: 0)
            micTapped = false
            return
        }

        let micFormat = mic.inputFormat(forBus: 0)
        mic.installTap(onBus: 0, bufferSize: 2048, format: micFormat) { (buffer, when) in
            let sampleData = UnsafeBufferPointer(start: buffer.floatChannelData![0], count: Int(buffer.frameLength))
        }
        micTapped = true
        startEngine()
    }

    @IBAction func playAudioFile(_ sender: Any) {
        stopAudioPlayback()
        let playerNode = AVAudioPlayerNode()

        let audioUrl = Bundle.main.url(forResource: "test_audio", withExtension: "wav")!
        let audioFile = readableAudioFileFrom(url: audioUrl)
        audioEngine.attach(playerNode)
        audioEngine.connect(playerNode, to: audioEngine.outputNode, format: audioFile.processingFormat)
        startEngine()

        playerNode.scheduleFile(audioFile, at: nil) {
            playerNode .removeTap(onBus: 0)
        }
        playerNode.installTap(onBus: 0, bufferSize: 4096, format: playerNode.outputFormat(forBus: 0)) { (buffer, when) in
            let sampleData = UnsafeBufferPointer(start: buffer.floatChannelData![0], count: Int(buffer.frameLength))
        }
        playerNode.play()
    }

    // MARK: Internal Methods

    private func configureAudioSession() {
        do {
            try AVAudioSession.sharedInstance().setCategory(AVAudioSessionCategoryPlayAndRecord, with: [.mixWithOthers, .defaultToSpeaker])
            try AVAudioSession.sharedInstance().setActive(true)
        } catch { }
    }

    private func readableAudioFileFrom(url: URL) -> AVAudioFile {
        var audioFile: AVAudioFile!
        do {
            try audioFile = AVAudioFile(forReading: url)
        } catch { }
        return audioFile
    }

    private func startEngine() {
        guard !audioEngine.isRunning else {
            return
        }

        do {
            try audioEngine.start()
        } catch { }
    }

    private func stopAudioPlayback() {
        audioEngine.stop()
        audioEngine.reset()
    }
}

The audio samples are given to you via installTap's completion handler which is continuously called as audio passes through the tapped node (either the mic or the audio file player) in real time. You can access individual samples by indexing the sampleData pointer that I've created in each block.

WongWray
  • 2,414
  • 1
  • 20
  • 25
  • 3
    This is what OP is looking for. Thanks for adding this answer years after the question was asked. – BigHeadCreations Oct 02 '18 at 15:53
  • 1
    @BigHeadCreations Happy to combat the void of AVAudioEngine info. Thanks for the nod :) – WongWray Oct 02 '18 at 19:36
  • @WongWray Excellent answer. There is really few examples on this. I am trying to do sample rate conversion on input tap, before I call a ML predictor on it. I need 4 seconds of sampled down data like a ring buffer so I call a prediction. Would you be interested to answer it IF I create a question for it? tnx – Spring Feb 04 '19 at 21:00
  • @WongWray how to make the audio engine immediately play the recorded audio from the microphone? – Roman Samoilenko Jul 15 '19 at 06:26
7

swift

Recording in iOS:

  • Create and maintain an instance of an AVAudioRecorder, as in var audioRecorder: AVAudioRecorder? = nil
  • Initialize your AVAudioRecorder with a URL to store the samples and some record settings

The recording session sequence:

  1. invoke prepareToRecord()
  2. invoke record()
  3. invoke stop()

Complete Swift/AVAudioRecorder Example

At the heart of your recording method, you could have:

func record() {
    self.prepareToRecord()
    if let recorder = self.audioRecorder {
        recorder.record()
    }
}

To prepare the recording (streaming to a file), you could have:

func prepareToRecord() {
    var error: NSError?
    let documentsPath = NSSearchPathForDirectoriesInDomains(.DocumentDirectory, .UserDomainMask, true)[0] as! NSString
    let soundFileURL: NSURL? = NSURL.fileURLWithPath("\(documentsPath)/recording.caf")
    
    self.audioRecorder = AVAudioRecorder(URL: soundFileURL, settings: recordSettings as [NSObject : AnyObject], error: &error)
    if let recorder = self.audioRecorder {
        recorder.prepareToRecord()
    }
}

Finally, to stop the recording, use this:

func stopRecording() {
    if let recorder = self.audioRecorder {
        recorder.stop()
    }
}

Example above also needs import AVFoundation and some recordSettings, left to your choice. An example of recordSettings may look like this:

let recordSettings = [
    AVFormatIDKey: kAudioFormatAppleLossless,
    AVEncoderAudioQualityKey : AVAudioQuality.Max.rawValue,
    AVEncoderBitRateKey : 320000,
    AVNumberOfChannelsKey: 2,
    AVSampleRateKey : 44100.0
]

Do this, you're done.


You may also want to check out this Stack Overflow answer, which includes a demo project.

Community
  • 1
  • 1
SwiftArchitect
  • 47,376
  • 28
  • 140
  • 179
  • 2
    This info is helpful, but how can I extract the individual audio samples from a recording? I need the raw data - preferably an array of Floats on which I can perform analysis. Same question applies to a file that is already on disk. – Hundley Jun 24 '15 at 01:23
  • Assuming you use the `kAudioFormatAppleLossless` format above, the samples are stored in a CAF file documented on https://developer.apple.com/library/ios/documentation/MusicAudio/Reference/CAFSpec/CAF_overview/CAF_overview.html#//apple_ref/doc/uid/TP40001862-CH209-TPXREF101. Reading samples from such a file is answered at http://stackoverflow.com/questions/13996236/how-to-convert-wav-caf-files-sample-data-to-byte-array. – SwiftArchitect Jun 30 '15 at 23:36
  • 1
    I found your http://swiftarchitect.com/recipes/#SO-32342486 to be very useful. Thanks. – vivin Apr 11 '16 at 16:30