Convert voice to text while talking in python

Question

I made a program which allows me to speak and converts it to a text. It converts my voice after I stopped talking. What I want to do is to convert my voice to text while I am talking.

https://www.youtube.com/watch?v=96AO6L9qp2U&t=2s&ab_channel=StormHack at min 2:31.

Pay attention to top right corner of Tony's monitor. It converts his voice to text while talking. I want to do the same thing. Can it be done?

This is my whole program:

import speech_recognition as sr 
import pyaudio


r = sr.Recognizer()
with sr.Microphone() as source:
    print("Listening...")
    audio = r.listen(source)
    try:
        text = r.recognize_google(audio)
        print("You said : {}".format(text))
    except:
        print("Sorry could not recognize what you said")

solution, tips, hints, or anything would be greatly appreciated, thank you in advance.

I assume you have looked through the docs here: https://github.com/Uberi/speech_recognition — Red Cricket, Dec 11 '18 at 04:50

score 1 · Answer 1 · answered May 16 '19 at 15:35

In order to do this you will have to do what's called VAD: Voice Audio Detection, a simple way to do this is take a set of samples from the audio and grab their intensity, if they are above a certain threshold then you should begin recording, once the intensity falls below a certain threshold for a given period of time then you conclude the recording and send it off to the service. You can find an example of this here.

More complex systems use better heuristics to decide whether or not the user is speaking, such as the frequency as well as applying things like noise reduction, other systems are also able to perform live speech to text as the user is speaking like DeepSpeech 2.

I appreciate you answering my question! Did you watch the video I've provided? If I do exactly what you said, will my program convert my voice into text while I am speaking? — WeeeHaaa, May 23 '19 at 02:35
No, but if you do what I said your program will work similar to Siri or Alexa. — 0x777C, May 23 '19 at 06:16

score 0 · Answer 2 · answered Jun 23 '21 at 14:39

To do what you want, you need to listen not to a complete sentence, but for just a few words. You then have to process the audio data and to finally print the result. Here is a very basic implementation of it:

import speech_recognition as sr
import threading
import time
from queue import Queue

listen_recognizer = sr.Recognizer()
process_recognizer = sr.Recognizer()

audios_to_process = Queue()

def callback(recognizer, audio_data):
    if audio_data:
        audios_to_process.put(audio_data)

def listen():
    source = sr.Microphone()
    stop_listening = listen_recognizer.listen_in_background(source, callback, 3)
    return stop_listening

def process_thread_func():
    while True:
        if audios_to_process.empty():
            time.sleep(2)
            continue
        audio = audios_to_process.get()
        if audio:
            try:
                text = process_recognizer.recognize_google(audio)
            except:
                pass
            else:
                print(text)

stop_listening = listen()
process_thread = threading.Thread(target=process_thread_func)
process_thread.start()

input()

stop_listening()

As you can see, I use 2 recognizers, so one will always be listening and the other will process the audio data. The first one listens to data, then adds the audio data to a queue and listens again. At the same time, the other recognizer is checking if there is audio data to process into some text to then print it.

Convert voice to text while talking in python

2 Answers2