I am trying to develop a Voice based application that would accept user input as speech and perform some actions based on the input. This is my first ever venture into this technology and I am learning while developing it.
I am using Microsoft SAPI shipped with dotnet 4 to recognize speech. So far, I have learned about the two types of modes it supports.
Speech recognition (SR) has two modes of operation:
Dictation mode — an unconstrained, free-form speech interpretation mode that uses a built-in grammar provided by the recognizer for a specific language. This is the default recognizer.
Grammar mode — matches spoken words to one or more specific context-free grammars (CFGs). A CFG is a structure that defines a specific set of words, and the combination of these words that can be used. In basic terms, a CFG defines the sentences that are valid for SR. Grammars must be supplied by the application in the form of precompiled grammar files or supplied at runtime in the form of W3C Speech Recognition Grammar Specification (SRGS) markup or the older CFG specification. The Windows SDK includes a grammar compiler: gc.exe.
So essentially, whatever words I specify with the grammar, the engine would recognize only those. But I also want to include some free form text along with the structured grammar. An example for that can be names of people. If I want to capture the name from the speech, I need to have that name specified with in the grammar, but that's not possible if the application is open for anyone to use.
Is there a way I can extract some text which is not a part of the grammar already?
How can I get the system to recognize sentences such as "My name is Gary and I am 25 years old". The name can be absolutely anything, how do I define it in my Grammar?