How to convert voice to text?

Question

I am trying to converting my wav file to text file using sphinx4. Is it possible to recognize the word which is not include in grammar file?

no, in the same way that is not possible to find information that is not anywhere to be found :). — Augusto, Sep 13 '11 at 13:27
Thank you, is there any other open source tool to recognize our voice and convert into text file. — RAAAAM, Sep 14 '11 at 05:58
The only one I've used is Autonomy Softsound (which is pretty good), but like with all these tools, you need to train the engine with the grammar you want to use. You can use the default grammar, but expect it only to understand correct language only (even on pronunciation). For example, the default en-uk language module (as it's known in Softsound), will produce a substandard result with an USA speaker. Softsound has all kind of clever features to detect languages and localizations, so it can do a better job. — Augusto, Sep 14 '11 at 09:10
Thanks again, is softsound open source. Do you have any example which recognize the speech and convert into text file without using any grammar file. I tried many things in sphinx4 but without entering words to the grammar file it wont accept the vocabulary. — RAAAAM, Sep 14 '11 at 09:49
Softsound it's not open source and it's quite expensive as far as I know :S. — Augusto, Sep 14 '11 at 10:21

score 1 · Answer 1 · answered Sep 14 '11 at 10:04

1

This is a common misconception that you should exactly work without any grammar to be able to recognize speech efficiently and solve the application task. Instead, it's practical to use some solution to the problem which will work and decode you files.

If you are not sure about the domain or the language you can always use a very generic language model assisted with syllable-based grammar to decode unknown words. It's often common to use web queries after that to turn syllable-based variants into words thus allowing system to aquire vocabulary. That will allow you to get a good result for very generic types of speech.

Sphinx-4 supports that.

answered Sep 14 '11 at 10:04

Nikolay Shmyrev

24,897
5
43
87

Thank you, i have tried Hello world and HelloNGram examples from demos files. Both the source using grammar file to recognize the speech. If i work without any grammar file what can i do. do you have any example for that. pls guide me. I am new to this concept. – RAAAAM Sep 14 '11 at 10:53
You can try LatticeDemo for an example of the large vocabulary continuous speech recognition. You can learn more about CMUSphinx and the concepts behind by reading a tutorial http://cmusphinx.sourceforge.net/wiki/tutorial – Nikolay Shmyrev Sep 16 '11 at 10:19
Thank you, i tried lattice demo to recognize continues vocabulary, but whenever i tried to upload my voice to that application i can't get single word correctly. In which English slang sphinx recognize the voice. I used to upload UK slang. – RAAAAM Sep 16 '11 at 10:41
Most likely the issue is not the slang but mismatch in the acoustic format and the mismatch of the language model. You can get more definite answer if you will post the audio example you are trying to recognize and your expected results. See also the FAQ entry: http://cmusphinx.sourceforge.net/wiki/faq#qwhy_my_accuracy_is_poor – Nikolay Shmyrev Sep 16 '11 at 10:50

How to convert voice to text?

1 Answers1