Is there an API (or any hacks) to access Enhanced Dictation in Mac OS X Mavericks?

Question

I am trying to find an easy way to transcribe an audio file to text (CMU Sphinx, Julius, etc. are difficult for someone not knowledgable about voice recognition, configuring language models, acoustic models, etc.).

I wondered if there was a way to pipe my audio file into the "Enhanced Dictation" feature of Mac OS 10.9 Mavericks, which allows for local, offline voice dictation.

I thought I was being clever when I put a patch cord from my headphones jack to my line in, but unfortunately when you start dictating, it mutes all other audio playing (any suggestions on how to disable this muting will get a right answer from me).

Perhaps not enhanced dictation (alas, I'm stuck with Snow Leopard, I don't know what it is) but yesterday I read the Speech Programming Guide and it said that if you needed more control than `NSSpeechRecognizer` offered, you could use the low-level Carbon API. I am still searching for the relevant documentation. — 11684, Mar 09 '14 at 17:01
Ha! Found it! https://developer.apple.com/library/mac/documentation/Carbon/Reference/Speech_Recognition_Manager/Reference/reference.html#//apple_ref/doc/uid/TP30000209 — 11684, Mar 09 '14 at 17:03

user2934244 · Answer 1 · 2013-11-18T01:40:41.840

I haven't found a direct way of doing this. However, you could use Soundflower as a workaround.

You can for example in VLC choose the Audio->Audio Device->Soundflower (2ch) for your output. Then in System Preferences->Dictation & Speech->Dictation->Soundflower (2ch) [drop down under the microphone icon]. Then you can start playback in VLC, and start dictation listening (for example in TextEdit) and you should see the transcription appear. The Downside to this approach is that it is slow (limited to ~real-time playback of audio), and not very conducive to automated workflow.

Note: you have to start audio playback before switching to TextEdit and initiating ED.

In my experience, this does not work. I have found it necessary to use Audio Hijack Pro as an intervening step. — brannerchinese, May 14 '16 at 18:01

score 2 · Answer 2 · answered Mar 30 '20 at 16:10

An API has appeared in macOS 10.15 (Catalina) that gives access to the underlying Speech system and allows transcription from an audio file or device (including the microphone). Since it's also available from iOS 10 onwards, I guess it's been ported to the Mac.

It has some limitations. Firstly, it passes its data to Apple's servers for transcription which may be important to you (Dictation used to have an on-device option but that may have disappeared in Catalina?). Probably because of that, it processes audio in chunks of no more than one minute.

See Speech for the API.

score -2 · Answer 3 · answered Oct 30 '13 at 16:16

The workaround I use with Dragon Dictate is to use a USB headset with microphone. I listen to the file I want to transcribe and repeat what I hear. It's kludgy but works and should work with Dictation as well. It's helpful if you can play the file into your headset at a slower speed to give you time to process what you are hearing and repeat it clearly.

Is there an API (or any hacks) to access Enhanced Dictation in Mac OS X Mavericks?

3 Answers3