I'm using the SpeechToText API to get a speech transcription from a live microphone recording. This is working well with a simple LINEAR16 encoding but I need to reduce the bandwidth so I switched to an OGG Opus encoding.
The audio is recorded and encoded in OGG Opus client-side, using this library.
Then, it's sent to a backend app using a websocket.
Finally, the backend app requests the STT API in streaming mode, which stays silent (no error but also no text in output).
These are the options for the encoding:
var options = {
monitorGain: 0,
recordingGain: 1,
numberOfChannels: 1,
encoderSampleRate: 16000,
encoderPath: "./javascript/ogg_opus/encoderWorker.min.js",
originalSampleRateOverride: 16000,
streamPages: true,
encoderApplication: 2048
};
This is the configuration which is sent to the API:
{
encoding: 'OGG_OPUS',
language: 'fr',
rate: 16000
}
This is the way the audio is sent to the websocket:
recorder.ondataavailable = function(typedArray){
var dataBlob = new Blob([typedArray], { type: 'audio/ogg' });
websocket.emit('audio_data', dataBlob);
};
The backend part is developed in Python, following this example.
Do you know the OGG Opus configuration needed to get the API to work?