-1

I am working in an application that gathers a user's voice input for an IVR. The input we're capturing is a limited set of proper nouns but even though we have added hints for all of the possible options, we very frequently get back unintelligible results, possibly as a result of our users having various accents from all parts of the world. I'm looking for a way to further improve the speech recognition results beyond just using hints. The available Google adaptive classes will not be useful, as there are none that match the type of input that we're gathering. I see that Twilio recently added something called experimental_utterances that may help but I'm finding little technical documentation on what it does or how to implement.

Any guidance on how to improve our speech recognition results?

desertnaut
  • 57,590
  • 26
  • 140
  • 166

1 Answers1

0

Google does a decent job doing recognition of proper names, but not in real time just asynchronously. I've not seen a PaaS tool that can do this in real time. I recommend you change your approach and maybe identify callers based on ANI or account number or have them record their name for manual transcription.

david

  • Thank you for the tip. Unfortunately, it's more complicated than that. What we are capturing is their native spoken language. We need to correctly identify what language they are indicating and select it from our rather long list of language names. For various reasons, we will need to continue to capture their spoken input versus them inputting some numeric value to indicate their native language. – Chris L Nuckolls Nov 07 '22 at 13:43
  • Oh boy, can you narrow down the list based on caller ID? So, do the ASR first if that fails. Go to DTMF and press what you think will be the most likely language based on some other call information? – David Macias Nov 07 '22 at 17:08
  • I wish it were that easy. Any one the thousands of clientIDs could get a call for any one of several dozen languages. In the absence of any better option, I'm leaning towards using hints to identify colloquialisms to further refine recognition, i.e. recognize both "Karen" and "Karenni" rather than just Karen as a native language. – Chris L Nuckolls Nov 08 '22 at 16:01