6

I'm using the SpeechToText API to get a speech transcription from a live microphone recording. This is working well with a simple LINEAR16 encoding but I need to reduce the bandwidth so I switched to an OGG Opus encoding.
The audio is recorded and encoded in OGG Opus client-side, using this library.
Then, it's sent to a backend app using a websocket.
Finally, the backend app requests the STT API in streaming mode, which stays silent (no error but also no text in output).

These are the options for the encoding:

var options = {
    monitorGain: 0,
    recordingGain: 1,
    numberOfChannels: 1,
    encoderSampleRate: 16000,
    encoderPath: "./javascript/ogg_opus/encoderWorker.min.js",
    originalSampleRateOverride: 16000,
    streamPages: true,
    encoderApplication: 2048
};

This is the configuration which is sent to the API:

{
   encoding: 'OGG_OPUS',
   language: 'fr',
   rate: 16000
}

This is the way the audio is sent to the websocket:

recorder.ondataavailable = function(typedArray){
    var dataBlob = new Blob([typedArray], { type: 'audio/ogg' });
    websocket.emit('audio_data', dataBlob);
};

The backend part is developed in Python, following this example.

Do you know the OGG Opus configuration needed to get the API to work?

Alexis MP
  • 750
  • 3
  • 8
eli0tt
  • 677
  • 1
  • 7
  • 19
  • A few things: first, it should really be language --> language_code and rate --> sample_rate_hertz (but you should get an error). Second, you should test a generated file with the Text-to-speech API, in synchronous mode and see if that works (it should). And finally, do you really need to use the streaming API + Audio stream vs. simply using a file or even the Synchronous API? Bottom line I don't think the problem is with the configuration of the API calls. – Alexis MP Dec 20 '19 at 10:40

0 Answers0