14

I have a class that uses the Android TTS API to transcribe text to audio. I can control the pitch and speed; but I noticed the engine requires a text string and also a hash object. I noticed some words are pronounced too quickly to be easily recognized, and inflection seems too unnatural. Is there a way I can control these two things; possibly through the HashMap? The following is how I'm using the engine:

    mTts = new TextToSpeech(Globals.context, this); // context, listener
}

@Override
public void onInit(int status) {
    HashMap<String, String> myHashRender = new HashMap();
    myHashRender.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, speech);
    mTts.setPitch(0.8f);
    mTts.setSpeechRate(0.6f);
    mTts.synthesizeToFile(speech, myHashRender, fileOutPath);
    while (mTts.isSpeaking()) try {
        Thread.sleep(100);
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
    mTts.stop();
    mTts.shutdown();
motoku
  • 1,571
  • 1
  • 21
  • 49
  • Google TTS does not currently support changing inflection, nor does it support inline prosody tags as defined in [SSML](http://help.voxeo.com/go/help/xml.vxml.elements.prosody). It's possible that other TTS engines support these features, but I am not aware of any. – alanv Jun 05 '15 at 20:30
  • Then why does the method take a hashmap, and a string? – motoku Jun 06 '15 at 23:16
  • There are parameters you can set, but none of them control inflection or per-word prosody. – alanv Jun 08 '15 at 20:03
  • @alanv Do you think you can put that as an answer? – motoku Jun 09 '15 at 05:09

3 Answers3

4

Google TTS does not currently support that, but here is what you can do: During parsing of your text, you can change parts of it to get the intonation and inflection you want.

For example, if you encounter the word 'Hey' you rewrite it on the fly to 'Heeeey' before you send it to the TTS engine to get a different pronounciation.

It is not pretty but it is a workaround.

DKIT
  • 3,471
  • 2
  • 20
  • 24
  • 1
    You might also consider using TtsSpan to change the metadata associated with certain words. IIRC, this does allow you to specify explicit pronunciation. – alanv Jun 11 '15 at 22:18
  • this has been quite old thread . but google TTS still not supporting SSML tags as searched through many documentation . I tried using some tags. only is working . I wonder if it does not support ssml how this tag is working ? – Gurpreet Kaur Dec 14 '17 at 10:58
3

Google TTS does not currently support changing inflection, nor does it support inline prosody tags as defined in SSML. - alanv Jun 5 at 20:30

Community
  • 1
  • 1
motoku
  • 1,571
  • 1
  • 21
  • 49
0

Google TTS does not currently support changing inflection, nor does it support inline prosody tags as defined in SSML. While there are parameters you can set, none of them control inflection or per-word prosody.

There may be other engines that do support these features. eSpeak, for example, does support SSML tags and has an Android port available on Play Store.

alanv
  • 23,966
  • 4
  • 93
  • 80