I'm trying to build a voice XML interface to a machine translation system. Most of the menu design is simple enough, but when the user actually says the phrase to be translated, I need to be able to intake whatever text comes from the ASR without trying to match it to a finite grammar. Is there a standard way to do this in voice XML?
2 Answers
If by standard way, you mean VoiceXML with SRGS/SISR, you could build a grammar that had ever word of the target language and the SI to reassemble the content into a slot. Not a practical solution, but a possible one within the specification constraints.
If you are just looking at VoiceXML, only building the capability into a browser would be a constraint, as VoiceXML doesn't provide any relevant restrictions for how $lastresult is populated.
Your implementation constraints and what your are trying to achieve might be helpful to create a practical solution.

- 4,143
- 3
- 25
- 27
The 'standard' VoiceXML not allows to get free text (because you allay use a grammar with strict rules), you plan to be out of the initial scope of the specification. If you can control your VoiceXML interpreter implementation you can use the same method as us. With our Voximal VoiceXML interpreter we solve this by using a builtin grammar :
<field name="text" type="text" > : it use the builtin:grammar/text
You can extend by adding parameter like "text?lang=en-US" or 'text?model=MyWatsonModel". The text restult is in the variable, and you can add extra values in the shaddow variables. All this is platform dependent, and of of the scope of the VoiceXML standard. But I think it is the best way to integrate SpeechToText in the VoiceXML.

- 119
- 6