0

I'm building a speech-to-audio web app that takes mic input, converts the recording to an MP3(using the mic-recorder-to-mp3 NPM package), and then sends it to the node.js/express server-side for storage and to pass along as a subsequent POST request to the speech-to-text API(rev.ai).

The recording functions fine on the UI, I have the recoding playing in an tag and it sounds fine and is the full length recording:

      stopBtn.addEventListener("click", () => {

    recorder
      .stop()
      .getMp3().then(([buffer, blob]) => {

        let newBlob = new Blob(buffer);
        recordedAudio.src = URL.createObjectURL(blob);
        recordedAudio.controls=true;
        sendData(blob);
      
      }).catch((e) => {
        console.log(e);
      });
  });

  function sendData(blob) {
    let fd = new FormData();
    fd.append('audio', blob);

    fetch('/audio', {
      headers: { Accept: "application/json", "Transfer-Encoding": "chunked" },
      method: "POST", body: fd
    });
  }

Now, at first in my server-side express route I was seeing multiple requests coming through per recording and thought it was an error that I could sort out later, so I put a quick boolean check to see if the request was already being processed and if so just res.end() back to the UI.

This was all good and fine until I realized that only the first 4 seconds of the recording were being saved. This 4 second recoding saved fine as an MP3 on the server-side and also plays correctly when opened up in a music app, and also transcribed correctly in rev.ai, but still it was only 4 seconds.

I realized that the audio blob was being sent in chunks to the UI and each chunk was part of the multiple requests I was seeing. So then I started looking into how to reassemble the chunks into on audio blob that can be saved as an MP3 and parsed correctly as audio on rev.ai, but nothing I've tried so far has worked. Here is my latest attempt:

app.post("/audio", async (req, res) => {
let audioBlobs = [];
let audioContent;
let filename = `narr-${Date.now()}.mp3`;
//let processed = false;

req.on('readable', async () => {
    //if(!processed){
        //processed = true;
        //let audioChunk = await req.read();
    //}
    while(null !== (audioChunk = await req.read())) {
             console.log("adding chunk")
        
        audioBlobs.push(audioChunk);
    
    }
    
});

req.on("end", () => {
    audioContent = audioBlobs.join('');

    fs.writeFile(`./audio/${filename}`, audioContent, async function(err) {
        if (err) {
            console.log("an error occurred");
            console.error(err);
            res.end();
        }

        const stream = fs.createReadStream(`./audio/${filename}`);
        let job = await client.submitJobAudioData(stream, filename, {}).then(data => {                
            waitForRevProcessing(data.id);
        }).catch(e => {
            console.log("caught an error");
            console.log(e);
        });
        res.end();
    })
});
});

The blob is saved on the server-side with this code, but it's not playable in a music app and rev.ai rejects the recording as it does not interpret the blob as an audio file.

Something about the way I'm reassembling the chunks is corrupting the integrity of the MP3 format.

I'm thinking this could be for few reasons:

  • The chunks could be coming to the server-side out of order, although it wouldn't make a whole lot of sense considering that when I had the boolean check in place it was seemingly saving the first chunk and not mid-chunks
  • The last chunk is being left "open" or there's some metadata that's missing or padding that's messing with the encoding
  • These might not be the correct events to listen to for starting/ending the assembly

I'm hoping that Express/the http node module have something built-in to automatically handle this and I'm doing this manual reassembly unnecessarily - I was pretty surprised there was nothing off-the-shelf in Express to handle this, but maybe it's not as common a use case as I imagined?

Any help that can be offered would be greatly appreciated.

Eric
  • 640
  • 12
  • 32
  • just make your blob from an Array of the reassembled chunks after the post has sent them all ... As each chunk received just push it and at end make the Blob from the array of dataArrays each received over http(s) – Robert Rowntree Nov 22 '20 at 19:49
  • @RobertRowntree that's exactly what I'm doing above? – Eric Nov 22 '20 at 20:29
  • verify that ur getting every chunk produced by the mic onto the Wire. on server verity that every posted chunk is coming off the wire. review docs for posting binary [dataArrays].. if its all bin data then u should be able to marshall it off the wire and then just push it to an array [of bin Arrays] and then at the end ( server-side w some blob capability ) just construct new Blob and it should all be good. – Robert Rowntree Nov 22 '20 at 20:49
  • https://www.npmjs.com/package/opus-recorder#callback-handlers this may NOT be recorder imple that you use. I looked at it recently ( use opus on webapps for audio ) and the setting for streamPages controls the freq. of calls to "onDataAvailable" . By setting streaming OFF you guarantee the ENTIRE audio will be in single call to flush the microphone buffers. – Robert Rowntree Dec 02 '20 at 15:02

0 Answers0