According to the release notes of Text-to-Speech API the <voice>
tag is working as expected. I tried replicating the scenario on my end using Node.js client library and it is working as expected.
The document of SSML says the <voice>
tag allows you to use more than one voice in a single SSML request. In my code I have used the default voice as English Male and for the other voice I have used the <voice name="hi-IN-Wavenet-D">
which is a female voice and I am getting two different voices in my output.mp3 file.
You can refer to the below Node.js code and output audio file.
tts1.js
// Imports the Google Cloud client library
const textToSpeech = require('@google-cloud/text-to-speech');
// Import other required libraries
const fs = require('fs');
const util = require('util');
// Creates a client
const client = new textToSpeech.v1beta1.TextToSpeechClient();
async function quickStart() {
// The text to synthesize
const ssml = '<speak>And then she asked, <voice name="en-IN-Wavenet-D"> where were you yesterday </voice><break time="250ms"/> in her sweet and gentle voice.</speak>'
// Construct the request
const request = {
input: {ssml: ssml},
// Select the language and SSML voice gender (optional)
voice: {languageCode: 'en-US', ssmlGender: 'NEUTRAL'},
// select the type of audio encoding
audioConfig: {audioEncoding: 'MP3'},
};
// Performs the text-to-speech request
const [response] = await client.synthesizeSpeech(request);
// Write the binary audio content to a local file
const writeFile = util.promisify(fs.writeFile);
await writeFile('output.mp3', response.audioContent, 'binary');
console.log('Audio content written to file: output.mp3');
}
quickStart();
Output mp3 file : output1 (using v1beta1)
I have also tried without using the v1beta1 version in node.js and it is working fine.
tts2.js:
// Imports the Google Cloud client library
const textToSpeech = require('@google-cloud/text-to-speech');
// Import other required libraries
const fs = require('fs');
const util = require('util');
// Creates a client
const client = new textToSpeech.TextToSpeechClient();
async function quickStart() {
// The text to synthesize
const ssml = '<speak>And then she asked, <voice name="en-IN-Wavenet-D"> where were you yesterday </voice><break time="250ms"/> in her sweet and gentle voice.</speak>'
// Construct the request
const request = {
input: {ssml: ssml},
// Select the language and SSML voice gender (optional)
voice: {languageCode: 'en-US', ssmlGender: 'NEUTRAL'},
// select the type of audio encoding
audioConfig: {audioEncoding: 'MP3'},
};
// Performs the text-to-speech request
const [response] = await client.synthesizeSpeech(request);
// Write the binary audio content to a local file
const writeFile = util.promisify(fs.writeFile);
await writeFile('output.mp3', response.audioContent, 'binary');
console.log('Audio content written to file: output.mp3');
}
quickStart();
Output mp3 file : output (without v1beta1 version)
Apart from this, I would like to inform you that I have also tried using the Python client library and it is also working as expected.
file1.py
from google.cloud import texttospeech
# Instantiates a client
client = texttospeech.TextToSpeechClient()
# Set the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(
ssml= '<speak>And then she asked, <voice name="en-IN-Wavenet-D"> where were you yesterday</voice><break time="250ms"/> in her sweet and gentle voice.</speak>'
)
# Build the voice request, select the language code ("en-US") and the ssml
# voice gender ("neutral")
voice = texttospeech.VoiceSelectionParams(
language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
)
# Select the type of audio file you want returned
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
input=synthesis_input, voice=voice, audio_config=audio_config
)
# The response's audio_content is binary.
with open("output.mp3", "wb") as out:
# Write the response to the output file.
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
output file : output (using Python)