5

I have a requirement where I need to convert some text to audio using Google Text to Speech.

I am using Nodejs to get convert the text to audio file, and want to send the audio output to the front-end.

NodeJS code:

const client = new textToSpeech.TextToSpeechClient();
const request = {
  input: {text: 'Hello World'},
  // Select the language and SSML voice gender (optional)
  voice: {languageCode: 'en-US', ssmlGender: 'NEUTRAL'},
  // select the type of audio encoding
  audioConfig: {audioEncoding: 'MP3'},
};

const [response] = await client.synthesizeSpeech(request);

response.audioContent contains the audio data which is a buffer object and looks like this:

<Buffer ff f3 44 c4 00 00 00 03 48 00 00 00 00 ff 88 89 40 04 06 3d d1 38 20 e1 3b f5 83 f0 7c 1f 0f c1 30 7f 83 ef 28 08 62 00 c6 20 0c 62 03 9f e2 77 d6 0f ... > 

I send this as an api response to the front-end. However, what I get in the front-end is a plain object with an array which looks like below:

{ "type": "Buffer", "data": [ 255, 243, 68, 196, 0, 0, 0, 3, 72, 0, 0, 0, 0, 255, 136, 137, 64, 4, 6, 61, 209, 56, 32, 225, 59, 245, 131, 240.......]}

My problems:

1) Since the data received by the front-end from the api is not a buffer anymore, how do I convert this data back to Buffer.

2) Once I have a proper buffer in the frontend, how do I use it to play the audio.

In my case, the text to be converted will always be 3-4 word phrases. So, I don't need any streaming ability.

My front-end is VueJS.

asanas
  • 3,782
  • 11
  • 43
  • 72
  • did you try sending it as an arraybuffer (.buffer), so you can just load it as Blob or ArrayBuffer on the frontend? new Uint8Array will get you the buffer you need for decodeAudioData or bloburl – user120242 May 22 '20 at 06:48
  • @user120242 I'll appreciate if you can help or provide a piece of code to do this. – asanas May 22 '20 at 07:00
  • isn't it already a buffer? – asanas May 22 '20 at 07:02
  • not the type of buffer you want. you should be sending raw responses to the client. that almost looks like you took the buffer object and stringified it – user120242 May 22 '20 at 07:03

3 Answers3

2

(Don't do this, you should be doing what Terry has implemented for you)

Although you should really be sending raw responses, so you can just ask fetch or XHR to just give you a Blob or ArrayBuffer to play. Or even just use an audio src="url" directly to it.

Please note: Although this code works, performance is poor. Around 3-4x bloat in file size. To give some context, Base64 would be around 1.33 in file size.

How this code works:
Node.js Buffers are really just Uint8Array with extensions. What is up there is a JSON representation of it. (I've generated a sample Buffer as response variable).
This converts the 8-bit integer string array back to Uint8Array and then feeds it to decodeAudioData to play using Web Audio API.
The Blob code just wraps the Uint8Array in a Blob, generates a BlobURL, and sets an audio tag src to it. Using new Audio() would also work.
Note: Web Audio requires a user initiated event (like a click) to play

audioData = new Uint8Array(response.data);
function playbuf(){
var audioCtx = new (window.AudioContext || window.webkitAudioContext)();
source = audioCtx.createBufferSource();
audioCtx.decodeAudioData(audioData.buffer.slice(0), function(buffer) {
    source.buffer = buffer;
    source.connect(audioCtx.destination);
    source.start(0);
  },

  function(e) {
    console.log("Error with decoding audio data" + e.err);
  });
}
var blob = new Blob([audioData.buffer],{type:'audio/mpeg'});
var bloburl = URL.createObjectURL(blob);  
document.body.innerHTML+=(`<audio controls src="${bloburl}">`)
<button id="btn" onclick="playbuf()">play</button>
<script>
response = {"type":"Buffer","data":[255,251,148,100,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,88,105,110,103,0,0,0,15,0,0,0,11,0,0,7,200,0,15,15,15,15,15,15,15,15,15,30,30,30,30,30,30,30,30,30,57,57,57,57,57,57,57,57,57,103,103,103,103,103,103,103,103,103,141,141,141,141,141,141,141,141,141,179,179,179,179,179,179,179,179,179,194,194,194,194,194,194,194,194,194,210,210,210,210,210,210,210,210,210,225,225,225,225,225,225,225,225,225,240,240,240,240,240,240,240,240,240,255,255,255,255,255,255,255,255,255,0,0,0,60,76,65,77,69,51,46,57,56,114,4,175,0,0,0,0,0,0,0,0,52,32,36,5,192,141,0,1,204,0,0,7,200,219,4,56,191,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,251,20,100,0,15,240,0,0,105,0,0,0,8,0,0,13,32,0,0,1,0,0,1,254,0,0,0,32,0,0,52,128,0,0,4,1,129,64,192,80,40,28,14,6,0,0,0,0,15,66,122,183,142,234,91,223,204,254,76,167,230,103,224,20,10,5,27,151,84,170,50,226,113,199,50,189,105,55,83,106,40,52,252,51,37,41,57,70,86,78,225,158,187,57,185,255,251,20,100,30,15,240,0,0,127,128,0,0,8,0,0,13,32,0,0,1,0,0,1,254,20,0,0,32,0,0,52,130,128,0,4,153,153,108,241,245,146,67,211,168,240,89,179,110,28,65,59,203,3,128,144,111,12,41,206,0,151,99,237,36,247,65,98,52,20,210,37,6,244,98,75,8,202,32,138,109,98,26,126,234,211,195,244,48,226,199,103,116,108,238,127,255,251,68,100,60,0,0,218,8,231,238,8,0,112,0,0,13,32,192,0,0,27,157,123,75,185,188,162,0,0,0,52,131,0,0,0,176,211,74,125,109,225,23,137,197,229,110,92,142,4,116,223,183,113,244,152,116,90,107,237,17,151,210,103,150,235,223,140,191,108,178,41,239,229,35,179,122,110,83,59,142,176,215,210,90,195,63,150,75,154,196,8,252,87,236,110,109,165,72,225,151,198,253,106,92,63,63,207,249,251,213,62,29,223,105,233,237,209,190,144,52,190,180,166,130,237,201,232,106,123,255,125,255,231,127,249,255,2,50,202,89,250,78,119,120,219,195,92,100,49,151,22,164,59,91,58,88,221,141,204,181,255,255,244,42,103,115,16,255,251,116,100,1,128,244,92,94,211,255,109,0,10,0,0,13,32,224,0,1,13,153,19,69,237,164,109,96,0,0,52,128,0,0,4,2,48,146,181,214,95,140,56,84,179,161,130,161,3,171,236,28,28,4,1,8,1,77,65,24,9,136,18,154,103,169,165,3,38,43,77,138,202,165,194,162,181,200,114,56,61,18,26,13,129,181,242,72,169,161,234,175,254,176,215,74,220,59,113,148,204,42,49,65,225,230,179,69,178,11,91,50,199,5,18,57,228,216,102,52,88,87,86,105,91,90,105,85,94,86,14,152,107,21,37,14,131,174,77,133,213,38,142,36,85,163,105,165,36,214,184,126,235,36,30,197,152,88,86,30,230,67,149,215,143,255,255,254,46,249,38,88,103,2,0,3,56,187,253,182,160,73,88,132,40,183,225,6,73,248,96,33,105,202,142,41,18,220,208,200,201,39,207,24,20,160,69,239,28,109,69,23,148,232,232,148,89,10,107,34,157,179,14,132,218,40,215,172,203,5,61,91,209,46,79,99,210,115,48,211,143,93,93,180,130,209,25,99,95,41,38,100,202,106,227,27,57,183,200,244,225,24,149,87,55,84,146,29,249,218,50,134,14,129,158,168,96,136,139,132,170,255,180,3,179,77,255,255,251,100,100,3,0,243,54,74,209,107,26,26,170,0,0,13,32,0,0,1,11,216,221,63,172,24,108,232,0,0,52,128,0,0,4,247,112,16,231,208,40,224,217,195,130,165,90,9,64,243,48,116,210,50,124,235,19,105,146,248,197,60,228,99,46,49,153,90,200,109,212,88,41,202,134,35,153,40,54,114,98,185,250,19,234,96,196,241,227,1,216,249,132,133,77,129,17,90,22,90,177,27,158,74,93,218,49,159,166,167,192,166,231,155,98,152,152,205,116,115,58,143,213,9,208,228,161,60,100,3,18,246,127,64,19,75,53,246,219,3,53,122,68,140,106,114,49,4,16,183,216,57,12,45,58,73,24,10,251,245,152,64,148,198,22,157,53,105,230,93,106,130,89,65,10,189,13,84,17,117,121,154,90,81,3,129,212,43,13,141,92,203,60,255,202,32,66,144,245,205,179,44,162,189,36,14,109,4,75,181,0,99,169,117,240,204,196,75,17,122,157,119,210,201,142,120,250,88,98,0,7,85,105,191,182,72,21,26,195,255,251,100,100,6,0,243,67,78,79,251,38,27,216,0,0,13,32,0,0,1,10,189,23,57,231,176,106,160,0,0,52,128,0,0,4,0,12,7,10,181,0,197,182,52,247,100,77,89,97,12,57,130,34,158,197,255,136,92,44,243,139,222,106,83,50,154,146,46,13,88,167,172,153,222,121,183,191,94,77,93,139,96,231,16,43,51,102,217,113,224,37,241,88,105,90,27,2,204,247,5,112,85,55,61,86,69,132,156,47,222,145,82,57,153,121,243,148,146,38,69,159,153,169,150,9,5,98,164,236,198,2,208,205,255,219,219,6,84,46,32,202,1,24,4,160,65,19,243,0,225,101,10,144,130,248,118,89,180,23,119,224,171,22,44,122,141,185,107,73,133,83,227,107,175,84,139,213,86,168,146,102,246,85,85,254,74,76,198,74,93,215,225,222,198,111,255,242,141,252,58,171,255,207,148,187,84,42,212,59,149,137,67,90,42,5,59,20,22,225,192,0,0,132,202,82,197,29,196,148,15,66,13,92,243,44,183,131,65,215,255,251,20,100,12,135,240,227,7,202,104,47,97,24,0,0,13,32,0,0,1,0,152,1,91,192,0,0,32,0,0,52,128,0,0,4,125,37,162,192,0,1,255,18,85,76,65,77,69,51,46,57,56,46,52,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,255,251,20,100,25,143,240,0,0,127,128,0,0,8,0,0,13,32,0,0,1,0,0,1,254,0,0,0,32,0,0,52,128,0,0,4,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,255,251,20,100,55,143,240,0,0,127,128,0,0,8,0,0,13,32,0,0,1,0,0,1,254,0,0,0,32,0,0,52,128,0,0,4,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,255,251,20,100,85,143,240,0,0,127,128,0,0,8,0,0,13,32,0,0,1,0,0,1,164,0,0,0,32,0,0,52,128,0,0,4,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,255,251,20,100,115,143,240,0,0,105,0,0,0,8,0,0,13,32,0,0,1,0,0,1,164,0,0,0,32,0,0,52,128,0,0,4,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85,85]}
</script>
user120242
  • 14,918
  • 3
  • 38
  • 52
  • When I try this, I get the error 'source' is undefined. – asanas May 22 '20 at 09:28
  • I can help you make this work if it's what you really need, but I _really_ recommend you use Terry's full back to front solution. – user120242 May 22 '20 at 09:30
  • Ok, I'll try Terry's solution and will let you know. Thanks. – asanas May 22 '20 at 09:31
  • But really, if your solution works for me, that's all I need. All my audios are going to be just 2-3 words. – asanas May 22 '20 at 09:33
  • alright, just be prepared that this will encode the files to be roughly 2x(-3x because of the spaces and commas) in size – user120242 May 22 '20 at 09:36
  • forgot to put in the audio source creation, it's in now. the blob part should work too to create an audio tag – user120242 May 22 '20 at 09:41
  • I get this error - Uncaught (in promise) DOMException: Unable to decode audio data – asanas May 22 '20 at 10:01
  • 1
    So I have a fully working demo here, where I've taken a `Buffer` of an mp3 file that I've read out using `JSON.stringify(fs.readFileSync('mp3.mp3'))`, and then pasted it as response above. So all else being the same it should work. Note that playing audio _must be initiated by a user event like a click_. And again, to emphasize: file size bloats up to 3-4 times in size. I think at this point though, you'd be better off base64 encoding it and playing it like that. It would be around 1.33 in size – user120242 May 22 '20 at 10:23
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/214457/discussion-between-asanas-and-user120242). – asanas May 23 '20 at 02:13
2

You can download the audio and play using an html audio player.

We need two files, index.js (Node.js) code and index.html (Vue.js / Client).

This will synthesize the text you input and play it.

Run the node script and go to http://localhost:8000/ to see the demo.

You could omit the "controls" attribute in to hide the audio player, it should still play the sound though!

index.js

const express = require("express");
const port = 8000;
const app = express();
const stream = require("stream");
const textToSpeech = require('@google-cloud/text-to-speech');

app.use(express.static("./"));

app.get('/download-audio', async (req, res) => { 

    let textToSynthesize = req.query.textToSynthesize;
    console.log("textToSynthesize:", textToSynthesize);

    const client = new textToSpeech.TextToSpeechClient();
    const request = {
        input: {text: textToSynthesize || 'Hello World'},
        // Select the language and SSML voice gender (optional)
        voice: {languageCode: 'en-US', ssmlGender: 'NEUTRAL'},
        // select the type of audio encoding
        audioConfig: {audioEncoding: 'MP3'},
    };

    const [response] = await client.synthesizeSpeech(request);
    console.log(`Audio synthesized, content-length: ${response.audioContent.length} bytes`)
    const readStream = new stream.PassThrough();

    readStream.end(response.audioContent);
    res.set("Content-disposition", 'attachment; filename=' + 'audio.mp3');
    res.set("Content-Type", "audio/mpeg");

    readStream.pipe(res);
});

app.listen(port);
console.log(`Serving at http://localhost:${port}`);

index.html

<!DOCTYPE html>
<html>
<body>
<script src="https://unpkg.com/vue@2.2.6/dist/vue.js"></script>
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/css/bootstrap.min.css">
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.min.js"></script>

<div class="container m-3" id="app">
<h2>Speech synthesis demo</h2>
<h4>Press synthesize and play to hear</h4>
<audio :src="audio" ref="audio" controls autoplay>
</audio>
<div class="form-group">
    <label for="text">Text to synthesize:</label>
    <input type="text" class="form-control" v-model="synthesisText" placeholder="Enter text" id="text">
</div>
<div>
    <button @click="downloadAudio">Synthesize and play</button>
</div>
</div>

<script>

    new Vue({
        el: "#app",
        data: {
            audio: null,
            synthesisText: "Gatsby believed in the green light, the orgiastic future that year by year recedes before us."
        },
        methods: {
            downloadAudio() {
                this.audio = "/download-audio?textToSynthesize=" + encodeURIComponent(this.synthesisText);
                this.$refs.audio.load();
                this.$refs.audio.play();
            }
        }
    });

</script>
</body>
</html> 
Terry Lennox
  • 29,471
  • 5
  • 28
  • 40
  • what does the this.audio contain? The api call returns an object. And I get the error: http://localhost:3001/[object%20Object] net::ERR_ABORTED 404 (Not Found) – asanas May 23 '20 at 02:10
  • 1
    The audio should contain whatever is written in the input field, do you see a form when you go to http://localhost:8000? Thanks! – Terry Lennox May 23 '20 at 02:32
  • https://chat.stackoverflow.com/rooms/214457/discussion-between-asanas-and-user120242 can you hop in on here? or this link.. i don't know how this works: https://chat.stackoverflow.com/rooms/info/214457/discussion-between-asanas-and-user120242 so for some odd reason this.audio is returning an [object] and it's getting coerced into a string – user120242 May 23 '20 at 02:34
  • Sorry to bother you again, but how do I even authenticate this stream URl. I'm using password for authenticating other routes in my node app. But if I apply that middleware to this url, it just fails with 401. – asanas May 23 '20 at 03:54
  • You could use a query parameter to authenticate, and populate when you set the text to synthesize. E.g. token=xyz. – Terry Lennox May 23 '20 at 04:00
  • Ok I'll do that. – asanas May 23 '20 at 04:47
2

No need to use AudioContext, just create a new Audio() with the blob, frontend:

api_get({
  route:'synthesize',
  body:{
    ssml:`
    <speak>
      <emphasis level="strong">Ser</emphasis>
      <break time="300ms"/>
      ou não ser,
      <break time="600ms"/>
      <emphasis level="moderate">eis</emphasis>
      a questão.
    </speak>`
  }
})
.then((response)=>{
  var audioData = new Uint8Array(response.audioContent.data);
  const blob = new Blob([audioData.buffer],{type:'audio/mp3'});
  new Audio( URL.createObjectURL(blob) ).play()
})

Axios in the middle:

const api_get = ((request, headers)=>{
  const route = request.route
  return new Promise(function(res, rej) {
    api.post("/api/"+route+"/", request.body, headers)
    .then((data) => {
      if(data){
          res(data)
      }else{
        res(null)
      }
    })
    .catch(error => {
      rej(null)
      console.log(error)
    })
  })
})

Node.js backend with Express:

router.route('/synthesize').post( (req,res)=>{
    const request = {
        audioConfig: {
            audioEncoding: "MP3",
            pitch: -1.00,
            speakingRate: 1
        },
        input: {
            text: req.body.text?req.body.text:null,
            ssml: req.body.ssml?req.body.ssml:null
        },
        voice: {
            languageCode: "pt-BR",
            name: "pt-BR-Standard-B"
        }
    }
    client.synthesizeSpeech(request).then((data)=>{
        if(data) res.send(data[0])
    })
})