I'm attempting to write a node application that transcribes audio from a microphone via AWS' streaming transcription service. What I have so far can be found in this repository (it's small).
Unfortunately the above doesn't work. I believe there's a bug in taking the data provided by the microphone stream and transforming it before passing it to the writable transcriber stream. This is because I have proven that the other two components of the app work
- I've written a piece of the app to pipe the mic to the speakers that proves that the mic stream works as expected.
- When sending requests over the WebSocket to the transcription service, it sends non-exceptional responses back, albeit empty, proving that the transcription service client works as expected.
As a side note, I'm not familiar with handling audio data and encoding (decoding?) it to PCM. I'm not even positive if what the mic-stream is giving me is PCM or not and if I need to decode from or encode to PCM before providing it to the transcription service. All of this is to say, I'm pretty sure the byte-handling is the issue.
Any help getting this sorted would be greatly appreciated.
Thanks, Geoff