1

I'm trying to create an application to transcribe a streaming audio recording. The idea is to capture the user's microphone stream using RecordRTC and send it in chunks to a gunicorn server using Socket.IO. The server will then create an input stream for Azure Speech to Text:

JavaScript Python Azure application architecture

I'm trying to capture audio every x seconds with RecordRTC in a format that is accepted by Azure Speech to Text:

startRecording.onclick = function() {
    startRecording.disabled = true;
    navigator.getUserMedia({
        audio: true
    }, 
    function(stream) {
            recordAudio = RecordRTC(stream, {
                type: 'audio',

                mimeType: 'audio/wav',
                desiredSampRate: 16000, // accepted sample rate by Azure
                timeSlice: 1000,
                ondataavailable: (blob) => {
                    socketio.emit('stream_audio', blob); // sends blob to server
                    console.log("sent blob")
                },
                recorderType: StereoAudioRecorder,
                numberOfAudioChannels: 1
        });
        recordAudio.startRecording();
        stopRecording.disabled = false;
    }, 
    function(error) {
        console.error(JSON.stringify(error));
    });
};

The blob returned by ondataavailable seems to return a byte string. However, for Azure Speech to Text I prefer to recieve chunks in wave format. It is possible to retrieve the entire recording in WAV format using getBlob(), but then the client only generates the file after stopRecording() is called.

Is there a way for RecordRTC to return a blob in wave format every x seconds? If not, what are other options to stream audio to Azure Speech to Text through Gunicorn?

All help is much appreciated!

Frank
  • 93
  • 9
  • So you are looking for some code about receiving data from blob and save it as a .wav file in your socket.io server ? – Stanley Gong Feb 16 '21 at 06:28

1 Answers1

1

Frank, do you plan to use Speech SDK to transcribe audio using a Push or Pull input stream? You do not need audio chunks in a WAV format for this. You feed raw PCM to the input stream, at the default format of 16khz, 16bit/sample mono. See sample code on GitHub.

Darren

Darren Cohen
  • 126
  • 6