How can I control how Android TTS plays audio

Question

I have a class that uses the Android TTS API to transcribe text to audio. I can control the pitch and speed; but I noticed the engine requires a text string and also a hash object. I noticed some words are pronounced too quickly to be easily recognized, and inflection seems too unnatural. Is there a way I can control these two things; possibly through the HashMap? The following is how I'm using the engine:

    mTts = new TextToSpeech(Globals.context, this); // context, listener
}

@Override
public void onInit(int status) {
    HashMap<String, String> myHashRender = new HashMap();
    myHashRender.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, speech);
    mTts.setPitch(0.8f);
    mTts.setSpeechRate(0.6f);
    mTts.synthesizeToFile(speech, myHashRender, fileOutPath);
    while (mTts.isSpeaking()) try {
        Thread.sleep(100);
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
    mTts.stop();
    mTts.shutdown();

Google TTS does not currently support changing inflection, nor does it support inline prosody tags as defined in [SSML](http://help.voxeo.com/go/help/xml.vxml.elements.prosody). It's possible that other TTS engines support these features, but I am not aware of any. — alanv, Jun 05 '15 at 20:30
There are parameters you can set, but none of them control inflection or per-word prosody. — alanv, Jun 08 '15 at 20:03

score 4 · Accepted Answer · answered Jun 11 '15 at 08:53

4

Google TTS does not currently support that, but here is what you can do: During parsing of your text, you can change parts of it to get the intonation and inflection you want.

For example, if you encounter the word 'Hey' you rewrite it on the fly to 'Heeeey' before you send it to the TTS engine to get a different pronounciation.

It is not pretty but it is a workaround.

answered Jun 11 '15 at 08:53

DKIT

3,471
2
20
24

1

You might also consider using TtsSpan to change the metadata associated with certain words. IIRC, this does allow you to specify explicit pronunciation. – alanv Jun 11 '15 at 22:18
this has been quite old thread . but google TTS still not supporting SSML tags as searched through many documentation . I tried using some tags. only is working . I wonder if it does not support ssml how this tag is working ? – Gurpreet Kaur Dec 14 '17 at 10:58

score 3 · Answer 2 · edited May 23 '17 at 10:26

3

Google TTS does not currently support changing inflection, nor does it support inline prosody tags as defined in SSML. - alanv Jun 5 at 20:30

edited May 23 '17 at 10:26

Community

1
1

answered Jun 10 '15 at 23:30

motoku

1,571
1
21
49

score 0 · Answer 3 · answered Jun 11 '15 at 22:16

Google TTS does not currently support changing inflection, nor does it support inline prosody tags as defined in SSML. While there are parameters you can set, none of them control inflection or per-word prosody.

There may be other engines that do support these features. eSpeak, for example, does support SSML tags and has an Android port available on Play Store.

How can I control how Android TTS plays audio

3 Answers3