3

I am trying to develop a Voice based application that would accept user input as speech and perform some actions based on the input. This is my first ever venture into this technology and I am learning while developing it.

I am using Microsoft SAPI shipped with dotnet 4 to recognize speech. So far, I have learned about the two types of modes it supports.

Speech recognition (SR) has two modes of operation:

  • Dictation mode — an unconstrained, free-form speech interpretation mode that uses a built-in grammar provided by the recognizer for a specific language. This is the default recognizer.

  • Grammar mode — matches spoken words to one or more specific context-free grammars (CFGs). A CFG is a structure that defines a specific set of words, and the combination of these words that can be used. In basic terms, a CFG defines the sentences that are valid for SR. Grammars must be supplied by the application in the form of precompiled grammar files or supplied at runtime in the form of W3C Speech Recognition Grammar Specification (SRGS) markup or the older CFG specification. The Windows SDK includes a grammar compiler: gc.exe.

So essentially, whatever words I specify with the grammar, the engine would recognize only those. But I also want to include some free form text along with the structured grammar. An example for that can be names of people. If I want to capture the name from the speech, I need to have that name specified with in the grammar, but that's not possible if the application is open for anyone to use.

Is there a way I can extract some text which is not a part of the grammar already?

How can I get the system to recognize sentences such as "My name is Gary and I am 25 years old". The name can be absolutely anything, how do I define it in my Grammar?

Community
  • 1
  • 1
Danish Khan
  • 1,893
  • 5
  • 22
  • 35

2 Answers2

3

You can mix dictation mode with grammar mode, see this example from MSDN:

http://msdn.microsoft.com/en-us/library/ms723634(v=vs.85).aspx

<GRAMMAR>
    <!-- command to handle first and last names with semantic properties -->
    <!-- By using semantic properties, the application can ignore all of
        the text returned, except for the text associated with the dictation
        tags' semantic properties "PID_FirstName" and "PID_LastName" -->
    <RULE ID="SubmitName" TOPLEVEL="ACTIVE">
        <P>
            my first name is
            <!-- Note the implicit maximum is only one word -->
            <DICTATION PROPID="PID_FirstName"/>
            and my last name is
            <!-- Note the implicit maximum is two words -->
            <DICTATION PROPID="PID_LastName" MAX="2"/>
        </P>
    </RULE>
</GRAMMAR>
Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • Thanks, for the pointers. Although the grammar mentioned here doesn't seem to be `SRGS` compatible, but now at least I have some approach that can be considered. – Danish Khan Nov 04 '11 at 09:17
1

Take a look at the GARBAGE special rule. I'm not sure how you would then retrieve the words that matched the garbage section, but I'm pretty sure there is a way.

Let me know if you figure it out as I'm interested on the subject too :).

Juan
  • 15,274
  • 23
  • 105
  • 187