I'm working on an app for people stuck in superfluous meetings who need to know when someone asks them a question.
My plan is:
- Stream the audio of the meeting (what normally comes out of my speakers) into a speech-to-text program
- Stream that into something that watches for my name and/or rising intonation for questions
- Have the program "ding" when someone asks me a question. Then I can quickly read the text and answer.
The hard part is step (1). All the speech-to-text programs I found accept audio files as input, and cannot just stream from whatever channel goes to the speakers/headphones. Assistive programs I found, on the other hand, take over keyboard input. Ideally, users will be able to do productive work by typing in other apps during the meeting, so that kind of solution won't work.
So I'm looking for something I can use on OS X that will either handle step (1) or even better do most of the steps above for me.
I've done research into solutions and can't find anything for step (1). I'm including the other steps because there may be a more creative solution for the overall program (such as some other assistive technology not for dictation) that I don't know about.