Boost / increase volume of text to speech (AVSpeechUtterance) to make it louder

Question

I have a navigation app that gives direction voice instruction (e.g. " In 200 feet turn left") using AVSpeechUtterance. I have put volume to 1 like so. speechUtteranceInstance.volume = 1, but still the volume is very low compared to the music or podcast coming from the iPhone, especially when the sound is on a Bluetooth or cabled connection (like connected to car with Bluetooth)

Is there any way to boost the volume? (I know this has been asked before on SO but so far have not found a solution that works for me.)

guido · Answer 1 · 2020-02-18T19:55:52.263

After a lot more research and playing around, I found a good workaround solution.

First of all I think this is an iOS bug. When all below conditions are true I found that the voice instruction itself is also ducked (or at least it sounds ducked) resulting in the voice instruction playing at the same volume as the DUCKED music (thus way too soft to hear well).

Playing music in the background
Ducking this background music through the .duckOther audioSessionCategory
Playing a voiceUtterance through AVSpeechSynthesizer
Playing audio over a connected bluetooth device (like bluetooth headset or bluetooth car speakers)

The workaround solution I found is to feed the speechUtterance to an AVAudioEngine. This can only be done on iOS13 or above, since that adds the .write method to AVSpeechSynthesizer

In short I use AVAudioEngine, AVAudioUnitEQ and AVAudioPlayerNode, setting the globalGain property of the AVAudioUnitEQ to about 10 dB. There are also a few quirks with this, but they can be worked around (see code comments).

Here's the complete code:

import UIKit
import AVFoundation
import MediaPlayer

class ViewController: UIViewController {

    // MARK: AVAudio properties
    var engine = AVAudioEngine()
    var player = AVAudioPlayerNode()
    var eqEffect = AVAudioUnitEQ()
    var converter = AVAudioConverter(from: AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatInt16, sampleRate: 22050, channels: 1, interleaved: false)!, to: AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatFloat32, sampleRate: 22050, channels: 1, interleaved: false)!)
    let synthesizer = AVSpeechSynthesizer()
    var bufferCounter: Int = 0

    let audioSession = AVAudioSession.sharedInstance()




    override func viewDidLoad() {
        super.viewDidLoad()



        let outputFormat = AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatFloat32, sampleRate: 22050, channels: 1, interleaved: false)!
        setupAudio(format: outputFormat, globalGain: 0)



    }

    func activateAudioSession() {
        do {
            try audioSession.setCategory(.playback, mode: .voicePrompt, options: [.mixWithOthers, .duckOthers])
            try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
        } catch {
            print("An error has occurred while setting the AVAudioSession.")
        }
    }

    @IBAction func tappedPlayButton(_ sender: Any) {

        eqEffect.globalGain = 0
        play()

    }

    @IBAction func tappedPlayLoudButton(_ sender: Any) {
        eqEffect.globalGain = 10
        play()

    }

    func play() {
        let path = Bundle.main.path(forResource: "voiceStart", ofType: "wav")!
        let file = try! AVAudioFile(forReading: URL(fileURLWithPath: path))
        self.player.scheduleFile(file, at: nil, completionHandler: nil)
        let utterance = AVSpeechUtterance(string: "This is to test if iOS is able to boost the voice output above the 100% limit.")
        synthesizer.write(utterance) { buffer in
            guard let pcmBuffer = buffer as? AVAudioPCMBuffer, pcmBuffer.frameLength > 0 else {
                print("could not create buffer or buffer empty")
                return
            }

            // QUIRCK Need to convert the buffer to different format because AVAudioEngine does not support the format returned from AVSpeechSynthesizer
            let convertedBuffer = AVAudioPCMBuffer(pcmFormat: AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatFloat32, sampleRate: pcmBuffer.format.sampleRate, channels: pcmBuffer.format.channelCount, interleaved: false)!, frameCapacity: pcmBuffer.frameCapacity)!
            do {
                try self.converter!.convert(to: convertedBuffer, from: pcmBuffer)
                self.bufferCounter += 1
                self.player.scheduleBuffer(convertedBuffer, completionCallbackType: .dataPlayedBack, completionHandler: { (type) -> Void in
                    DispatchQueue.main.async {
                        self.bufferCounter -= 1
                        print(self.bufferCounter)
                        if self.bufferCounter == 0 {
                            self.player.stop()
                            self.engine.stop()
                            try! self.audioSession.setActive(false, options: [])
                        }
                    }

                })

                self.converter!.reset()
                //self.player.prepare(withFrameCount: convertedBuffer.frameLength)
            }
            catch let error {
                print(error.localizedDescription)
            }
        }
        activateAudioSession()
        if !self.engine.isRunning {
            try! self.engine.start()
        }
        if !self.player.isPlaying {
            self.player.play()
        }
    }

    func setupAudio(format: AVAudioFormat, globalGain: Float) {
        // QUIRCK: Connecting the equalizer to the engine somehow starts the shared audioSession, and if that audiosession is not configured with .mixWithOthers and if it's not deactivated afterwards, this will stop any background music that was already playing. So first configure the audio session, then setup the engine and then deactivate the session again.
        try? self.audioSession.setCategory(.playback, options: .mixWithOthers)

        eqEffect.globalGain = globalGain
        engine.attach(player)
        engine.attach(eqEffect)
        engine.connect(player, to: eqEffect, format: format)
        engine.connect(eqEffect, to: engine.mainMixerNode, format: format)
        engine.prepare()

        try? self.audioSession.setActive(false)

    }

}

Thank you very much for sharing this. I was trying forever to get this to work! — Zack Bartel, Aug 16 '20 at 03:32
This is a nice solution and it helped a lot. Thank you for that. It works in normal cases but in case of CarPlay it does not seem to work. On a quick internet search voicePrompt category does not allow any audio processing including AVAudioUnitEQ processing. Any thoughts on the same? — nishith Singh, Aug 03 '21 at 19:37
@nishithSingh Sorry. I don’t know. I am not using the voicePrompt category (perhaps that’s your answer… use a different category) and have not tested with CarPlay. — guido, Aug 03 '21 at 21:47

score 0 · Answer 2 · answered Jul 12 '19 at 03:12

0

The docs mention that the default for .volume is 1.0 and that's the loudest. Actual loudness is based on the user volume settings. I didn't really have an issue with the speech being not loud enough if the user has the volume turned up.

Maybe you could consider showing a visual warning if the user volume level is below a certain level. Seems like this answer shows how to do that via AVAudioSession.

AVAudioSession is worth exploring as there are some settings that do impact speech output... like for example does you app's speech interrupt audio from other apps.

answered Jul 12 '19 at 03:12

glotcha

558
6
13

1

My app is already ducking other audio (and pausing spoken audio). The problem is that users might be playing music from another app e.g. music app. So simply having the user turning the overall volume up or down is not an option as then the music volume would be too loud or too low. The relative volume of the speech is just very low compared to the music or podcast volume. – guido Jul 12 '19 at 03:17
I never noticed that before, but now that I check I agree with you. Also the slow fade in and out of the background music seems to make things worse.Is there any way to change the fade settings? TBH i think you might be restricted by the system. If your speech is not always dynamic you could output to audio files (seems to be available only on macOS) and then put them through an audio filter to boost the volume. – glotcha Jul 12 '19 at 03:51
I could try to also interrupts music (to change the fade settings). Unfortunately the speech is quite dynamic so I think output to audio files is not possible, unless that's nearly real-time? I'll look into it though. I've also thought about just increasing the overall volume (through MPVolumeView) when speech is spoken and then turn it back down once done speaking. Not sure. I'll post here if any of these work. Tha KS for your suggestions. Really appreciate it! – guido Jul 12 '19 at 03:56
No worries! One last think i would try, have a look to see if any big 3rd party apps manage to achieve what you want... it'll give you an idea of whether it's possible and maybe give you some hints / inspiration ;-) good luck @gudio – glotcha Jul 12 '19 at 04:00
I started to do a little work on speech today, i noticed that in iOS13 there are some updates to AVSpeechSynthesizer maybe worth a look, now has a AVAudioSession property and a mysterious mixToTelephonyUplink property – glotcha Jul 14 '19 at 03:14
Thanks. I'll look into that. Hope Apple gets this sorted. I've done some more trial and error. The volume is only this low when playing on Bluetooth or wired. Changing category or mode of audio session doesn't seem to make a difference. Also it seems related to the mixwithothers/duckothers options (but these are required for my app). My app is iOS12 and up so also need a solution for iOS12 too – guido Jul 14 '19 at 13:08
I've had some success with increasing the volume through MPVolumeView, but it's one big hack to do it like that. Not even sure it will get approved, because officially you're not allowed to increase overall volume programatically. It should always be a user action. And this doesn't work when volume is already set to max (which is quite common when connected to carkit) – guido Jul 14 '19 at 13:10

score -1 · Answer 3 · answered May 22 '20 at 10:00

Try this:

import Speech

try? AVAudioSession.sharedInstance().setCategory(.playback, mode: .default, options: [])

let utterance = AVSpeechUtterance(string: "Hello world")        
utterance.voice = AVSpeechSynthesisVoice(language: "en-GB")

let synthesizer = AVSpeechSynthesizer()
synthesizer.speak(utterance)

Boost / increase volume of text to speech (AVSpeechUtterance) to make it louder

3 Answers3