I'm working on a project that requires a live audio to be transcribed in real-time. I tried the AWS Transcribe with WebSockets using their starter code available on GitHub.
Currently, for testing I have an audio file from a YouTube which I'm streaming to an icecast2 server hosted on a Cloud VM. The ffmpeg command for streaming to the icecast2 server is
ffmpeg -re -i yt.wav -ar 44100 -ac 1 -c:a libvorbis -aq 5 -content_type 'audio/ogg' -vn -f ogg icecast://source:hackme@serverIP:8000/mystream.ogg
I've modified the code from GitHub such that instead of reading audio data from a microphone it reads the audio from icecast2 server. The problem with this is all it sometimes doesn't return a transcript at all or returns the wrong transcript.
I'd really appreciate if anyone could help