How can I get live transcription on OS X (without audio files)?

Question

I'm working on an app for people stuck in superfluous meetings who need to know when someone asks them a question.

My plan is:

Stream the audio of the meeting (what normally comes out of my speakers) into a speech-to-text program
Stream that into something that watches for my name and/or rising intonation for questions
Have the program "ding" when someone asks me a question. Then I can quickly read the text and answer.

The hard part is step (1). All the speech-to-text programs I found accept audio files as input, and cannot just stream from whatever channel goes to the speakers/headphones. Assistive programs I found, on the other hand, take over keyboard input. Ideally, users will be able to do productive work by typing in other apps during the meeting, so that kind of solution won't work.

So I'm looking for something I can use on OS X that will either handle step (1) or even better do most of the steps above for me.

I've done research into solutions and can't find anything for step (1). I'm including the other steps because there may be a more creative solution for the overall program (such as some other assistive technology not for dictation) that I don't know about.

What a fantastic idea. Exactly what I need!! Surely there must be a way to do live audio transcription from the soundcard using AI or ML now days? There are some options which do it by adding a participant to the meeting, but that is too obvious. — Dirk R, Dec 15 '22 at 09:55

score 1 · Accepted Answer · answered Jan 04 '17 at 20:31

You can use many APIs, for example the streaming API from Google, it is not totally free though.

If you tolerate lower accuracy you can use open source software like CMUSphinx.

The problem is also how to get audio stream from the voip software, you have to hack it yourself. Or you have to re-record what is played on speakers, it is not always a good idea.

score 1 · Answer 2 · answered Jan 05 '17 at 12:30

1) I have used LoopBack for inter-app audio routing, essentially a virtual mixer that pipes audio from 1 app into another. It shows up as an audio input device and also allows monitoring - so you can listen as well as stream to another app.

2 and 3) Not really my area of expertise, but I would probably investigate any google API's (as Nikolay said) to start my research.

How can I get live transcription on OS X (without audio files)?

2 Answers2