How to improve the accuracy for speech-to-text conversion using recognize_sphinx API in Python

Question

How can we improve the accuracy of speech to text conversion using recognize_sphinx API in Python?

Please find the below code, which needs to improve the accuracy base!

import speech_recognition as sr

# Obtain path to "english.wav" in the same folder as this script
from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath(file)), "english.wav")
AUDIO_FILE = path.join(path.dirname(path.realpath(file)), "french.aiff")
AUDIO_FILE = path.join(path.dirname(path.realpath(file)), "chinese.flac")

# Use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
audio = r.record(source) # Read the entire audio file
# Recognize speech using Sphinx
try:
    print("Sphinx thinks you said " + r.recognize_sphinx(audio))
except sr.UnknownValueError:
    print("Sphinx could not understand audio")
except sr.RequestError as e:
    print("Sphinx error; {0}".format(e))

I am having the same issue with English voice file. Have you found a solution? — Boris Modylevsky, May 02 '22 at 12:06

Scircia · Answer 1 · 2022-11-23T15:42:33.283

So, if I'm understanding you correctly, you're having trouble getting the right output based on what the user, or in your case the audio file, has said. E.g. the audio/user will say "Hi there!" the output may be "Something complete different".

Reviewing your code, I noticed you're using three types of different audio files. Each file is speaking in a different language. When you'll open the documentation of SpeechRecognition you'll see that there is a library reference. In this library reference there will be notes on using PocketSphinx. The first thing that will stand out is that:

By default, SpeechRecognition's Sphinx functionality supports only US English. Additional language packs are also available, but not included due to the files being too large

I guess you have installed all the needed packages for this. I'm not going to explain that part because it's pretty self-explanatory. Anyway, the documentation also explains that you can:

Once installed, you can simply specify the language using the language parameter of recognizer_instance.recognize_sphinx. For example, French would be specified with "fr-FR" and Mandarin with "zh-CN".

I am not sure if the code above is yours, or you just copy and paste it from somewhere. Anyway, there are some issues with your code. You keep overriding your AUDIO_FILE variable with another file. So instead of "obtain the path to "english.wav" in the same folder as this script", you obtain the path to "chinese.flac".

Now, I guess you already know what might be the problem with the "accuracy for speech to text". It's "listening" to Chinese and trying to output it as English words. It's pretty self-explanatory...

To fix this, just add a language parameter and set it to the language you want it to be specified to. E.g.,

import speech_recognition as sr

# Obtain path to "chinese.flac" in the same folder as this script
from os import path

# AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav")
# AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "french.aiff")
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "chinese.flac")

# Use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
    audio = r.record(source)  # Read the entire audio file

# Recognize speech using Sphinx
try:
    # Just pass a language parameter
    print("Sphinx thinks you said " + r.recognize_sphinx(audio, language="zh-CN"))
except sr.UnknownValueError:
    print("Sphinx could not understand audio")
except sr.RequestError as e:
    print("Sphinx error; {0}".format(e))

How to improve the accuracy for speech-to-text conversion using recognize_sphinx API in Python

1 Answers1