0

Greeting StackOverflow community,

Is it possible to take what a user says or enters (like the letters 1 - 9) and instead of the text to speech engine reading the numbers back to the user it plays a prerecorded audio clip so it sounds like our voiceover person instead of the robot?

Can you do this dynamically based on what the user inputs?

All i'm really asking for is a prod in the correct direction of how to start figuring this out.

Krunkmaster
  • 335
  • 1
  • 13

1 Answers1

1

You can. I've written logic, a long time ago, that takes the desired phrase and a list of available clips to find the largest segments (clips often had multiple phrases) that could be used to assemble the audio. It tends to sound very choppy, but it is possible if you have enough prerecorded audio. In my case the content was in a niche and could be accomplished with 95% coverage with only a couple thousand recordings.

At the end, it was just basic search logic to find clips. If you do this at the word level, you could just name each clip with the word and split the input and generate the audio tags. <audio src='the.wav'/><audio src='quick.wav'/><audio src='brown.wav'/><audio src='fox.wav'/>...

Jim Rush
  • 4,143
  • 3
  • 25
  • 27
  • fortunately i only have to do the letters 0 - 9. Do you know of any documentation that would be helpful to read to understand the concept? Also.. thank you for your reply. i really appreciate it – Krunkmaster Sep 22 '16 at 23:38
  • Are you asking for logic to split a text string and generate a list of audio clips? The easiest approach, if this is client side VoiceXML would be to build a list of file names in Javascript and play the array with a foreach element. If the code is server generated, directly generate the audio elements on the page. – Jim Rush Sep 23 '16 at 18:17
  • Thanks for the push in the correct direction. I appreciate your time and expertise – Krunkmaster Sep 23 '16 at 18:21