VoiceXML - Recognize DTMF in Recording

Question

I've been doing IVR work for a while, but we have a case where I'd love some expertise/feedback:

Is it possible to record a message where the user could press a DTMF tone to indicate a pause where we would insert our own sound? In this scenario, the user would record something like: "Good Morning, [DTMF], please call the office at [DTMF] to reconcile your account.".

Not sure whether we would chop the resulting WAV file into pieces to insert our variables, or do some post-processing before sending out our message.

Does anyone have any experience with something like this?

Thanks

Jim Stanley Blackboard Connect

What VoiceXML platform are you using? – gawi Mar 19 '13 at 21:18 — gawi, Mar 19 '13 at 21:18

score 1 · Answer 1 · edited May 23 '17 at 11:57

In VoiceXML you would use a record element to record a message from a user. The record element has an attribute call dtmfterm which if set to true (default setting) will terminate recording. If this attribute is set to false then recording is terminated when maxtime setting is reached or silence for the duration of finalsilence is reached. Having dtmfterm set to false will just result in the DTMF being part of the recording. Setting dtmfterm to true will result in the recording being terminated.

I have created applications that use caller created recordings but never one that manipulates the recordings like in your requirements. What you may be able to do is concatenate recordings together. Here is a QA that shows how to concatenate wav recordings using C#.

What you will have to experiment with is whether you can catch which DTMF key was pressed by using grammars. The spec eludes to this but it may be somewhat specific to the VoiceXML IVR platform that you are using. If you know what DTMF key was used then you can instruct the user to press * to insert silence and # to terminate recording. Both will terminate a recording but the logic in your VoiceXML will go right back into recording again if the * is pressed and stop the recording process completely if the # is pressed. Then you would use the concatenation to string these recording together and use a wav file with pre-recorded silence in the concatenation process that is inserted between the users recorded snippets.

From the tags it looks like you are using C# and MVC for your VoiceXML application. There is an open source project called VoiceModel that makes it easier to develop VoiceXML applications using ASP.NET MVC 4. You can read about how it handles recording in this environment here.

Kevin, pressing * to enable multiple recordings is an idea I hadn't thought of, and might well prove to be what we're looking for. I'd be integrating this into our existing VoiceXML project (though I will take a look at the VoiceModel for future reference) - I'm assuming that a controller method that currently gets one file could get multiple files encoded in the HTTP header. Thanks again! Jim Stanley Blackboard Connect — Jim Stanley, Mar 19 '13 at 00:33
Kevin has the right approach for the VoiceXML aspects. The DTMF tone can be performed with some post audio processing. DTMF tones are loud and specific. Some basic audio filter code should help you find the start and stop of the tone, which can then be replaced with your own audio. — Jim Rush, Mar 19 '13 at 03:06

score 0 · Answer 2 · answered Mar 18 '13 at 17:26

0

If you want to insert a pause and want to stay within the UI tag , So far how much work I had in IVR, the only dtmf with which we could stay within the UI is * and we would return a grammar "REPEAT" on pressing '*' , in the UI condition tag for REPEAT , you would add the silence (pause) wav file.

The recording part , we used osdmtype = record which mapped to an xslt which helped in the recording and recognising Customer's answer yes/no.
But nevertheless I'm bit confused on the requirement exactly , would need more details.
Sorry can't add comments as don't have enough Rep.
You can mail me or i can add more answers here.

answered Mar 18 '13 at 17:26

Rameez Ahmed Sayad

1,300
6
16
29

Thinking it through more, what I need exactly is to enable a tag with a dtmfterm attribute set to false, but still enable the user to terminate the recording by pressing #. According to one VXML reference, this is possible, but in our tests setting dtmfterm to false doesn't stop when pressing #. – Jim Stanley Mar 18 '13 at 18:09
I am not sure but there was another attribute 'terminator operator' which was set by default as '#' , so we used to expicitly update the terminator operator to '' (blank). Means '#' can now return another different grammar when pressed. Just try to check on those lines. – Rameez Ahmed Sayad Mar 18 '13 at 18:35
Hmmm, we havnen't used the tag much in our code, and I'm having a difficult time implementing it (Qwest is our dev IVR provider). I'm trying the following and getting "invalid message type" errors.... – Jim Stanley Mar 18 '13 at 18:55
( # ) – Jim Stanley Mar 18 '13 at 18:55
Oh yeah, also a – Jim Stanley Mar 18 '13 at 18:56
not Qwest , CenturyLink ;) your grammar types seem a different format 'x-jsgf' , we generally used '*.grxml' nothing but xml itself.
but that shouldn't be an issue , since you say CenturyLink is your dev IVR provider , i can help you , mail me at sayadrameez@gmail.com – Rameez Ahmed Sayad Mar 19 '13 at 11:36

VoiceXML - Recognize DTMF in Recording

2 Answers2

Linked