I'm building a speech-to-audio web app that takes mic input, converts the recording to an MP3(using the mic-recorder-to-mp3 NPM package), and then sends it to the node.js/express server-side for storage and to pass along as a subsequent POST request to the speech-to-text API(rev.ai).
The recording functions fine on the UI, I have the recoding playing in an tag and it sounds fine and is the full length recording:
stopBtn.addEventListener("click", () => {
recorder
.stop()
.getMp3().then(([buffer, blob]) => {
let newBlob = new Blob(buffer);
recordedAudio.src = URL.createObjectURL(blob);
recordedAudio.controls=true;
sendData(blob);
}).catch((e) => {
console.log(e);
});
});
function sendData(blob) {
let fd = new FormData();
fd.append('audio', blob);
fetch('/audio', {
headers: { Accept: "application/json", "Transfer-Encoding": "chunked" },
method: "POST", body: fd
});
}
Now, at first in my server-side express route I was seeing multiple requests coming through per recording and thought it was an error that I could sort out later, so I put a quick boolean check to see if the request was already being processed and if so just res.end() back to the UI.
This was all good and fine until I realized that only the first 4 seconds of the recording were being saved. This 4 second recoding saved fine as an MP3 on the server-side and also plays correctly when opened up in a music app, and also transcribed correctly in rev.ai, but still it was only 4 seconds.
I realized that the audio blob was being sent in chunks to the UI and each chunk was part of the multiple requests I was seeing. So then I started looking into how to reassemble the chunks into on audio blob that can be saved as an MP3 and parsed correctly as audio on rev.ai, but nothing I've tried so far has worked. Here is my latest attempt:
app.post("/audio", async (req, res) => {
let audioBlobs = [];
let audioContent;
let filename = `narr-${Date.now()}.mp3`;
//let processed = false;
req.on('readable', async () => {
//if(!processed){
//processed = true;
//let audioChunk = await req.read();
//}
while(null !== (audioChunk = await req.read())) {
console.log("adding chunk")
audioBlobs.push(audioChunk);
}
});
req.on("end", () => {
audioContent = audioBlobs.join('');
fs.writeFile(`./audio/${filename}`, audioContent, async function(err) {
if (err) {
console.log("an error occurred");
console.error(err);
res.end();
}
const stream = fs.createReadStream(`./audio/${filename}`);
let job = await client.submitJobAudioData(stream, filename, {}).then(data => {
waitForRevProcessing(data.id);
}).catch(e => {
console.log("caught an error");
console.log(e);
});
res.end();
})
});
});
The blob is saved on the server-side with this code, but it's not playable in a music app and rev.ai rejects the recording as it does not interpret the blob as an audio file.
Something about the way I'm reassembling the chunks is corrupting the integrity of the MP3 format.
I'm thinking this could be for few reasons:
- The chunks could be coming to the server-side out of order, although it wouldn't make a whole lot of sense considering that when I had the boolean check in place it was seemingly saving the first chunk and not mid-chunks
- The last chunk is being left "open" or there's some metadata that's missing or padding that's messing with the encoding
- These might not be the correct events to listen to for starting/ending the assembly
I'm hoping that Express/the http node module have something built-in to automatically handle this and I'm doing this manual reassembly unnecessarily - I was pretty surprised there was nothing off-the-shelf in Express to handle this, but maybe it's not as common a use case as I imagined?
Any help that can be offered would be greatly appreciated.