How to access Google text-to-speech beta features (March 1, 2021 release)

Question

On March 1, 2021, Google Text-to-speech released beta features, including support for the ssml <voice> tag with name or lang attributes.

I'm hoping to use these beta features, but I can't figure out what channel they were released to or how to access them. I haven't found any breadcrumbs in the documentation that would lead me to them.

I noticed that on the TTS product home page, the demo feature uses v1beta1, but doesn't support the <voice> tag. Screenshot of json from TTS demo stripping out the voice tag

That is, for the ssml:

<speak>
Blah Blah English Text. <voice name="ko-KR-Wavenet-D"> Blah Blah Korean Text.</voice> <break time="400ms" /> Blah Blah English Text.
</speak>

the demo shows the following json request body:

{
  "audioConfig": {
    "audioEncoding": "LINEAR16",
    "pitch": 0,
    "speakingRate": 1
  },
  "input": {
    "ssml": "<speak> Blah Blah English Text. Blah Blah Korean Text. <break time=\"400ms\" /> Blah Blah English Text. </speak>"
  },
  "voice": {
    "languageCode": "en-US",
    "name": "en-US-Wavenet-D"
  }
}

What we've tried In our own script using the google text-to-speech api to generate audio from a csv cue sheet, we've historically used the general release. The script still works when we change to v1beta1, but the <voice> tag still doesn't function. We're using the npm package symlinked to nodejs-text-to-speech master.

Our script uses: const textToSpeech = require('@google-cloud/text-to-speech'); and the general release const client = new textToSpeech.TextToSpeechClient();

We've been trying to access the March 1 beta features with const client = new textToSpeech.v1beta1.TextToSpeechClient();

@JoshuaCrowley Nope -- any ideas? We had to move on to higher priorities some time ago, but still hoping to get some clues and figure this out. — DubiousDesigns, Jun 18 '21 at 06:18
I have a similar issue with the `` tag which should be supported since v1beta1. While investigating my issue, I tried your example text in the TTS demo page and in my own, java-based client. Both didn't work. BUT: when I added the opening `` tag, which is missing in your example, it worked in my client. Still does not work with the demo page, though. Can you try your client again with the opening tag? — Lena Schimmel, Sep 28 '21 at 09:34
@LenaSchimmel That was a typo on my part in the OP, but in the actual SSML we're using, I have the opening `` tag. I just tried again using our script, and it's still not working. In your java-based client, you were able to get two different voices using the `` tag? — DubiousDesigns, Sep 28 '21 at 20:04
Yes, in my Java-based client, the SSML example from your post and the SSML example from the new post by @Sandeep-Mohanty work. — Lena Schimmel, Sep 29 '21 at 08:23

score 2 · Answer 1 · answered Sep 29 '21 at 08:06

According to the release notes of Text-to-Speech API the <voice> tag is working as expected. I tried replicating the scenario on my end using Node.js client library and it is working as expected.

The document of SSML says the <voice> tag allows you to use more than one voice in a single SSML request. In my code I have used the default voice as English Male and for the other voice I have used the <voice name="hi-IN-Wavenet-D"> which is a female voice and I am getting two different voices in my output.mp3 file.

You can refer to the below Node.js code and output audio file.

tts1.js

// Imports the Google Cloud client library
const textToSpeech = require('@google-cloud/text-to-speech');
// Import other required libraries
const fs = require('fs');
const util = require('util');
// Creates a client
const client = new textToSpeech.v1beta1.TextToSpeechClient();
async function quickStart() {
 // The text to synthesize


 const ssml =  '<speak>And then she asked, <voice name="en-IN-Wavenet-D"> where were you yesterday </voice><break time="250ms"/> in her sweet and gentle voice.</speak>'

 // Construct the request
 const request = {
   input: {ssml: ssml},
   // Select the language and SSML voice gender (optional)
   voice: {languageCode: 'en-US', ssmlGender: 'NEUTRAL'},
   // select the type of audio encoding
   audioConfig: {audioEncoding: 'MP3'},
 };

 // Performs the text-to-speech request
 const [response] = await client.synthesizeSpeech(request);
 // Write the binary audio content to a local file
 const writeFile = util.promisify(fs.writeFile);
 await writeFile('output.mp3', response.audioContent, 'binary');
 console.log('Audio content written to file: output.mp3');
}
quickStart();

Output mp3 file : output1 (using v1beta1)

I have also tried without using the v1beta1 version in node.js and it is working fine.

tts2.js:

// Imports the Google Cloud client library
const textToSpeech = require('@google-cloud/text-to-speech');

// Import other required libraries
const fs = require('fs');
const util = require('util');
// Creates a client
const client = new textToSpeech.TextToSpeechClient();
async function quickStart() {
 // The text to synthesize


 const ssml =  '<speak>And then she asked, <voice name="en-IN-Wavenet-D"> where were you yesterday </voice><break time="250ms"/> in her sweet and gentle voice.</speak>'

 // Construct the request
 const request = {
   input: {ssml: ssml},
   // Select the language and SSML voice gender (optional)
   voice: {languageCode: 'en-US', ssmlGender: 'NEUTRAL'},
   // select the type of audio encoding
   audioConfig: {audioEncoding: 'MP3'},
 };

 // Performs the text-to-speech request
 const [response] = await client.synthesizeSpeech(request);
 // Write the binary audio content to a local file
 const writeFile = util.promisify(fs.writeFile);
 await writeFile('output.mp3', response.audioContent, 'binary');
 console.log('Audio content written to file: output.mp3');
}
quickStart();

Output mp3 file : output (without v1beta1 version)

Apart from this, I would like to inform you that I have also tried using the Python client library and it is also working as expected.

file1.py

from google.cloud import texttospeech

# Instantiates a client
client = texttospeech.TextToSpeechClient()

# Set the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(
  
 ssml=  '<speak>And then she asked, <voice name="en-IN-Wavenet-D"> where were you yesterday</voice><break time="250ms"/> in her sweet and gentle voice.</speak>'
    )

# Build the voice request, select the language code ("en-US") and the ssml
# voice gender ("neutral")
voice = texttospeech.VoiceSelectionParams(
   language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
)

# Select the type of audio file you want returned
audio_config = texttospeech.AudioConfig(
   audio_encoding=texttospeech.AudioEncoding.MP3
)

# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
   input=synthesis_input, voice=voice, audio_config=audio_config
)

# The response's audio_content is binary.
with open("output.mp3", "wb") as out:
   # Write the response to the output file.
   out.write(response.audio_content)
   print('Audio content written to file "output.mp3"')

output file : output (using Python)

score 0 · Answer 2 · answered Feb 09 '23 at 14:44

Google cloud npm package I'm trying to use the FEMALE voice but its always come with NEUTRAL voice

export async function quickStart(text) {
   
    const request = {
        input: { text: text },
        // Select the language and SSML voice gender (optional)
        voice: { languageCode: 'en-US', ssmlGender: 'FEMALE' },
        // select the type of audio encoding
        audioConfig: { audioEncoding: 'MP3' },
    };

    
    const [response] = await client.synthesizeSpeech(request);
    console.log(response);
    return response;
    
}

How to access Google text-to-speech beta features (March 1, 2021 release)

2 Answers2

Linked