5

We use a solution in C#.net where someone can call a phone number and speak a persons First, and then Last Name. Then the name is entered on a guest registry on our website. We use an XML dictionary file with 5,000 First Names and 89,000 last names that we got from the US Census. We are using the Microsoft.Speech.Recognition library, (maybe that's the problem).

Our problem is that even with relatively easy names like Joshua McDaniels we are getting about a 30% fail rate. The performance, (speed-wise), is fine it just doesn't grab a good portion of the names.

Now, I understand that ultimately the quality of the spoken name will dictate, sorry for the pun, how well the system performs, but what we would like to get close to 99% in "laboratory" conditions with perfect enunciation and no accent and then call it good. But even after several trials with the same person speaking, same name, same phone, same environment, we are getting a 25% fail rate.

My question is: Does anyone have an idea of a better way to go after this? We thought of maybe trying to use an API, that way the matches would be more relevant and current.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87

1 Answers1

3

The current state of the technology is that it is very hard to recognize names, moreover a large list of them. You can recognize names from the phone book (500 entries) with good quality, but for thousands of them it is very hard. Speech recognition engines are certainly not designed for that, in particular offline ones like System.Speech.

You might get way better results with online systems like https://www.projectoxford.ai which use advanced DNN acoustic models and bigger vocabularies.

There were whole big companies built around the capability to recognize large name lists, for example Novauris used patented technology for that. You might consider building something like that using open source engine, but it would be a large undertaking anyway.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • I did some experimentation with the MS Speech API. It works great for sentences because it uses intent so it will look at the first word, make a guess, then move on to the next word. The sample in the SDK shows it go through several iterations for a short sentence. Unfortunately, it didn't work at all for names. We decided to sunset this feature. We listened to several of the .wav files and as humans we didn't even know what they were saying. So we figured that even if we could get it working 100% of the time in laboratory conditions it would fall to the current 70-80% so why bother. – Denver Coder Oct 27 '15 at 16:58
  • @Nikolay Shmyrev "You can recognize names from the phone book (500 entries) with good quality" - which software, API or library could do that? I am doing some research and couldn't find one where you can match against a custom list. – sigmaxf Dec 24 '16 at 06:21
  • Any modern which supports specification of the list beforehand - at&t, ms project oxford. From open source solutions Kaldi. – Nikolay Shmyrev Dec 25 '16 at 00:35