So although it's still a little shocking to me, Google's default speech recognition completely and totally ignores music/ambient noise. The problem is, for my use case I want it to actually try to transcribe the music!
I'm using the Web Speech API in Chrome 72 with the demo they have.
I can't get it to pick up things said from music at all, even when I place the speaker next to the mic.
I also can't get it to pick up any Youtube Videos or videos playing from online.
It also doesn't pick up anything my Alexa says.
I have an Android so I'm assuming they're doing something similar to Amazon in commercials by playing an unhearable sound that they use to cancel out the recording? Is there any way to disable this?
It also doesn't work if I play music from my Mac or PC directly.
It however DOES transcribe if I video chat someone (using WebRTC if that matters) and they say something which is played through the speakers.
For anyone wondering, I want it to transcribe a video that is playing on the same page of a human speaking with no background music. I'm using their demo code to see if this is viable.
Is there any way to recognize these sounds?
To clarify, I'm asking specifically how to disable this for the Web Speech API and not in general for speech recognition.
The Web Speech API is a very specific way to request speech recognition from the browser itself (in Chrome it goes to Google, in Firefox I believe they have a native solution).
There's more info on it here: https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API but it lacks documentation as it varies across browsers, and I am specifically asking to avoid this in Chrome.