0

I am trying to use VOSK to transcribe voice input from my M1 MacBook Air microphone and the program runs fine, it just gets no input from the microphone. I found that if I use the larger English voice model with a '''exception_on_overflow = False''' it sometimes, after 20 seconds or so, hears the word "the" and prints it.

from vosk import Model, KaldiRecognizer
import pyaudio

model = Model(r'/Users/myname/Desktop/AVIS/vosk-model-small-en-us-0.15') 
#vosk-model-en-us-0.22-lgraph

recognizer = KaldiRecognizer(model, 16000)


cap = pyaudio.PyAudio()
stream = cap.open(format = pyaudio.paInt16, channels = 1, rate = 16000, input = True, frames_per_buffer = 8192)
stream.start_stream()

while True:
    
    data = stream.read(4096)#, exception_on_overflow = False)

    if recognizer.AcceptWaveform(data):
        print(recognizer.Result())

I hand copied this code from several different video tutorials on getting started with VOSK.

Originally I was using the larger VOSK English speech model but I would get an error OSError: [Errno -9981] Input overflowed. But as soon as I changed the model to the smaller English model, in an attempt to get it to recognize anything, it was gone.

I tried changing the format of pyaudio from paInt16 to paFloat32 (32 bit float as said in the Mac Audio Midi Setup) but that didn't do anything either.

0 Answers0