Chrome 89 has a Live Caption feature, which can transcribe English text from an audio/video as it plays. It even works offline, so it's not contacting Google's servers.
Is there any way to use this feature programmatically, e.g. to give it an audio file and capture the transcribed text?
EDIT: This guy wrote some code that lets you do it. But you need to figure out how to disassemble and patch Google's libsoda yourself. I did get it working though.