0

I'm using dialogflow for speech recognition and intents, but when I get a response with the output audio, I can't find a way to play the audio response. The audio response comes in the format of some kind of array. The JSON object looks like this (It's the data parameter that I'm trying to convert to audio):

{
   "type":"Buffer",
   "data":[255,251,16,196,0,0,0,0,1,164,20,0,0,32,...]},
   "latency":2906,
   "fulfillmentMessages":["Here's your Adele playlist."],
   "parameters":
      {
         "any":[],
         "music-artist":["Adele"]},
         "success":true
      }
}

I already tried converting it to an ArrayBuffer and then decoding it, but that didn't seem to work either

  playByteArray(byteArray) {
    var arrayBuffer = new ArrayBuffer(byteArray.length);
    var bufferView = new Uint8Array(arrayBuffer);
    for (let i = 0; i < byteArray.length; i++) {
      bufferView[i] = byteArray[i];
    }
    let context = new AudioContext();
    context.decodeAudioData(
      arrayBuffer,
      function(buffer) {
        this.play(buffer);
      }.bind(this)
    );
  }

  play(buf) {
    // Create a source node from the buffer
    let context = new AudioContext();
    var source = context.createBufferSource();
    source.buffer = buf;
    // Connect to the final output node (the speakers)
    source.connect(context.destination);
    // Play immediately
    source.start(0);
  }

edit: This is a JSON example I get back from DialogFlow: https://drive.google.com/open?id=1Y2UegyJ9BEwL6AR77Skly7prA4UalsNM

Sharon
  • 27
  • 1
  • 10
  • Might [this answer](https://stackoverflow.com/questions/14908838/loading-an-audio-buffer-and-play-it-using-the-audio-tag) help you? – TKoL Jun 03 '19 at 11:54
  • I already tried that, and it didn't work unfortunately. If you look at my play function: it basically does the same. I also tried with ".noteOn(0)", but that throws the error that that function doesn't exist on "source" – Sharon Jun 03 '19 at 12:01
  • Looking at your sample data, it appears the audio data is only a couple of hundred bytes in length. That doesn't "feel" right. When you say you play it, you don't hear any sound ... could this data be bad? If we think that data is PCM then assuming only 8KHz fidelity, that would be 8000 data points per second of audio output. – Kolban Jun 04 '19 at 16:54
  • 1
    Thanks a lot Kolban! Your hint made me check what the backend looks like, seems like they had some wrong configuration for dialogflow. – Sharon Jun 05 '19 at 08:50

1 Answers1

1

Seems the problem was in the backend. We had some wrong configuration: The outputAudioConfig in the request should look like this:

outputAudioConfig: {
      audioEncoding: `OUTPUT_AUDIO_ENCODING_LINEAR_16`,
      sampleRateHertz: 44100
    }

but the audioEncoding was set to "OUTPUT_AUDIO_ENCODING_MP3"

Besides that they also added a wrong parameter:

 queryParams: {
      payload: structjson.jsonToStructProto({ source: "ACTIONS_ON_GOOGLE" }) // Let's pretend to be Google
    }

After removing this parameter, the code I listed in my question worked. Thanks to @Kolban for pointing out that the response array wasn't correct to begin with.

Sharon
  • 27
  • 1
  • 10