Getting started with Speech Recognition and Sphinx

Question

Sphinx seems to be only real option for Java speech recognition. Documentation is sparse and it requires a high-level of domain knowledge. I used their example of a starting program and it works for one file and not for another, extremely similar, file. What is the difference? What is the secret to getting it to work more accurately.

This file, https://www.opdsupport.com/downloads/miscellaneous/sample-audio-files/52-welcome-wav/download works, but this one, https://www.opdsupport.com/downloads/miscellaneous/sample-audio-files/49-longwelcome-wav/download does not.
I noticed that the non-working file had a different sample rate, so I used a program to convert it to 16000, but still no luck

what about mono vs stereo. See https://cmusphinx.github.io/wiki/tutorialsphinx4/#streamspeechrecognizer:~:text=Please%20note%20that%20the%20audio%20for%20this%20decoding%20must%20have%20one%20of%20the%20following%20formats — PaulProgrammer, Jan 21 '21 at 21:49
Yes, I also noticed that the file that didn't work had 2 channels, so I also converted that to 1 and it still doesn't seem to work (I would attach the modified file, but there doesn't seem to be a way to do that) — Peter Kronenberg, Jan 22 '21 at 01:19
@PaulProgrammer Turns out you're right. The conversion method I used didn't work. I tried converting with SOX to 16000 sample rate and 1 channel and that fixed the problem. Was also successfully able to convert an mp3 to a wav file. — Peter Kronenberg, Jan 22 '21 at 19:03

score 1 · Accepted Answer · answered Jan 22 '21 at 20:57

1

Make sure to inspect the file carefully. According to the docs your file must be either 8khz or 16khz and mono only. There are many tools available to do this -- I use audacity, but probably overkill for just a basic conversion like this.

answered Jan 22 '21 at 20:57

PaulProgrammer

16,175
4
39
56

Yes, I'm understanding more and more. It seems like Sphinx only supports WAV file, so essentially, all files must be converted to WAV, is that right? I basically want to be able to support any audio file. So it sounds like you have a working system with Sphinx? If you've got any other advice, I'd love to hear it. Especially regarding performance and which models to use. It seems there are other models besides what comes with it and it's unclear which is the 'best' one – Peter Kronenberg Jan 22 '21 at 23:02
1

It would not be uncommon to have processing pipeline that would start with a set of known file types, and then use a utility like `ffmpeg` to convert files to wav before processing. No idea about "best" models -- it seems to me that sphinx has some "good enough" models, but also encourages you to create your own from your use cases. – PaulProgrammer Jan 24 '21 at 17:45

Getting started with Speech Recognition and Sphinx

1 Answers1