Why can't I control the Apple macOS Speech Synthesis audio unit with slider values?

Question

I'm working to incorporate Apple speech synthesis audio unit stuff (works only on macOS, not iOS) into AudioKit and I've built a AKSpeechSynthesizer Class (initially created by wangchou in this pull request) and a demo project both available on the develop branch of AudioKit.

My project is very similar to this Cocoa Speech Synthesis Example but on this project, the rate variable can be changed and varied smoothly between a low number of words per minute (40) up to a high number (300 ish). However, my project starts off at the default rate of 175 and any change slows down the rate to a crawl - except if you change it up to 350, then it goes super fast.

I can't see what I am doing different from this example as both projects rely on

SetSpeechProperty(speechChannel, kSpeechRateProperty, newRate as NSNumber?)

to set the rate.

Here's my implementation and the working one.

The biggest difference is that my synthesizer is set up as an audio unit, whereas I think the working example just uses the default output to speaker.

The other parameters of frequency(pitch) or modulation (pitchMod) are also exhibiting strange behavior, but its less noticeable on those, and those work a little funny in both projects.

Can someone tell me why mine doesn't work or fix it via a pull request? Any help would be greatly appreciated and attributed within the code.

Thanks!

Nicolas Tisserand · Accepted Answer · 2018-04-08T12:19:56.103

It seems like the rate, pitch and modulation speech properties need to be integral values, without fractional parts, for everything to work properly.

The CocoaSpeechSynthesis example actually exhibits the same behaviour, but initialises the rate field to an integral value. To reproduce the problem, try setting the rate first to 333, and then 333.3, for instance.

The other pitch and modulation parameters appear to be equally picky about fractional parts and seem to only yield reasonable results when set to integral values as well.

Unfortunately, I could not find any online reference documentation material that confirms these findings, but here is a patch that lets the 3 speech parameters behave in the SpeechSynthesizer example project:

diff --git a/AudioKit/Common/Nodes/Generators/Speech Synthesizer/AKSpeechSynthesizer.swift b/AudioKit/Common/Nodes/Generators/Speech Synthesizer/AKSpeechSynthesizer.swift
index 81286b8fb..324966e13 100644
--- a/AudioKit/Common/Nodes/Generators/Speech Synthesizer/AKSpeechSynthesizer.swift 
+++ b/AudioKit/Common/Nodes/Generators/Speech Synthesizer/AKSpeechSynthesizer.swift 
@@ -47,7 +47,7 @@ open class AKSpeechSynthesizer: AKNode {
                return
            }
            AKLog("Trying to set new rate")
-            let _ = SetSpeechProperty(speechChannel, kSpeechRateProperty, newRate as NSNumber?)
+            let _ = SetSpeechProperty(speechChannel, kSpeechRateProperty, newRate.rounded() as NSNumber?)
        }
    }

@@ -70,7 +70,7 @@ open class AKSpeechSynthesizer: AKNode {
                return
            }
            AKLog("Trying to set new freq")
-            let _ = SetSpeechProperty(speechChannel, kSpeechPitchBaseProperty, newFrequency as NSNumber?)
+            let _ = SetSpeechProperty(speechChannel, kSpeechPitchBaseProperty, newFrequency.rounded() as NSNumber?)
        }
    }

@@ -93,7 +93,7 @@ open class AKSpeechSynthesizer: AKNode {
                return
            }
            AKLog("Trying to set new modulation")
-            let _ = SetSpeechProperty(speechChannel, kSpeechPitchModProperty, newModulation as NSNumber?)
+            let _ = SetSpeechProperty(speechChannel, kSpeechPitchModProperty, newModulation.rounded() as NSNumber?)
        }
    }

It's just 3 extra calls to Swift's number rounding method.

Thank you, I have implemented your suggestions, and more here: https://github.com/AudioKit/AudioKit/commit/6a429df7e59059821ba5d3770430a71794533db8 Could I ask your advice on why the stop button is not stopping the speech playback? — Aurelius Prochazka, Apr 08 '18 at 21:34
I couldn't get stop to function, so far. I tried `PauseSpeechAt(speechChannel, kImmediate)`, `StopSpeechAt(speechChannel, kImmediate)`, `SpeakCFString(speechChannel, "" as CFString, [ kSpeechNoSpeechInterrupt: false ] as CFDictionary)`. Also tried to enforce speech interrupt on play: `SpeakCFString(speechChannel, text as CFString, [ kSpeechNoSpeechInterrupt: false ] as CFDictionary)` before I ran out of ideas. It seems like all speech plays are being enqueued and will be played in sequence, no matter what. Related: https://stackoverflow.com/questions/44730756/stop-audiounit-speech — Nicolas Tisserand, Apr 09 '18 at 00:08
Yeah, well, I suppose I can at least add a volume control via AKBooster to do some control. Thanks again for looking into this! — Aurelius Prochazka, Apr 09 '18 at 03:21
I added the callback by "SetSpeechProperty(speechChannel, kSpeechWordCFCallBack, callbackAddr)". The callback will log the current speaking text ranges. It shows all ranges is spoken right after calling SpeakCFString. I don't know why the behavior is different from Apple CocoaSpeechSynthesisExampe... It's stoppable in the example. — joshmori, Apr 09 '18 at 04:28
Looks like the speechChannel "from audioUnit" and "from NewSpeechChannel" function is different. https://lists.apple.com/archives/coreaudio-api/2016/Oct/msg00025.html — joshmori, Apr 09 '18 at 06:33

Why can't I control the Apple macOS Speech Synthesis audio unit with slider values?

1 Answers1

Linked