Android SpeechRecognizer "confidence" values are confusing

Question

I'm using the SpeechRecognizer via Intent:

Intent i = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
i.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
        RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);

i.putExtra(RecognizerIntent.EXTRA_PROMPT,
        "straight talk please");

i.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, 5);
i.putExtra(RecognizerIntent.EXTRA_LANGUAGE, 
            "en-US";

startActivityForResult(i, 0);

And I get the results in onActivityResults() like this:

protected void onActivityResult(int requestCode, int resultCode, Intent data) {

    if (requestCode == 0 && resultCode == RESULT_OK) {

        // List with the results from the Voice Recognition API
        ArrayList<String> results = data
                .getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);

        // The confidence array
        float[] confidence = data.getFloatArrayExtra(
                RecognizerIntent.EXTRA_CONFIDENCE_SCORES);

        // The confidence results       
        for (int i = 0; i < confidence.length; i++) {
            Log.v("oAR", "confidence[" + i + "] = " + confidence[i]);
        }
    }

    super.onActivityResult(requestCode, resultCode, data);
}

But the float array always returns 0.0 as result, but the first element like this:

confidence[0] = any value between 0 and 1
confidence[1] = 0.0
confidence[2] = 0.0
and so on

I would expect that every result has a confidence value between 0 and 1. Otherwise it seems pretty useless, because the result with the highest confidence will be the first element by default, without using the EXTRA_CONFIDENCE_SCORES. Is there something I'm missing?

Furthermore the RecognizerIntent.EXTRA_CONFIDENCE_SCORES is supposed to be used in API Level 14++. But it doesn't matter on which API above 8 I use it the result stays the same. Are the docs out of date in that point?

Not sure what kind of answer do you expect. It works the way it was implemented by Google and there is little hope you can change it in a current version. Maybe it will be supported in a future versions. It's better to rethink the application you are trying to build and select the right tool to implement it. Open source speech recogntion toolkits are way more flexible in this regard and at least you can get something using them. — Nikolay Shmyrev, Sep 23 '13 at 13:56
@NikolayShmyrev What I found out is that this feature isn't usable as the description in the docs claims. Because of that I expected answers like, you use it wrong, this isn't supported as you think of or it's not possible because it's just a placeholder for a future implementation. I tried to use a implementation of the Android framwork because usually the user is familiar with it. But it looks like a third library is the only option. I just want to get sure that the docs need a update in this point. — Steve Benett, Sep 23 '13 at 15:06
The docs are not out of date, as already explained in an answer to your similar question, see: http://stackoverflow.com/questions/18694497/speech-recognizer-get-confidence-below-api-14/18735510#18735510 — Kaarel, Sep 27 '13 at 16:34

score 3 · Answer 1 · answered Jan 27 '15 at 09:30

According to my interpretation of the documentation:

recognizerIntent.Extra_Results returns an ordered arrayList of strings, each of which is one suggestion as to what was said, with the string at index 0 being the suggestion the Recognizer is most confident of.

recognizerIntent.Extra_Confidence_Scores returns an array of floats corresponding to each of these suggestions.

So, if the results you are getting are correct(otherwise this might be a bug), then the recognizer has 1, and only 1, suggestion that it has confidence in and several others that it has only negligible or no confidence in.

I've been getting similar results. I've never had a set of results in which more than one suggestion had non-negligible confidence, just like you. e.g. 0.7435, 0.0, 0.0, 0.0, ......

I have however sometimes gotten a set of results in which ALL results have negligible confidence. e.g. 0.0, 0.0, 0.0, 0.0, 0.0, ......

So yes the first element in Results will always be the one the Recognizer is most confident of.

score 1 · Answer 2 · answered Sep 27 '13 at 04:40

I haven't work with speech reorganization. But still, as you said you are getting float array value as 0.0, this implies float array is null . can you please check is the float[] is returning null or else.

ArrayList<String> results = data
            .getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);

float[] confidence = data.getFloatArrayExtra(
            RecognizerIntent.EXTRA_CONFIDENCE_SCORES);
if (confidence == null)
{
 for (int i = 0; i < results.size(); i++)
  {
   Log.d(TAG, i + ": " + results.get(i));
  }
}
else
{
   for (int i = 0; i < results.size(); i++)

   {
     Log.d(TAG, i + ": " + heard.get(i) + " confidence : "  + confidence[i]);
  }
}

Can you please check the book Professional Android Sensor Programming By Greg Milette, Adam Stroud this will surely help you. You will get some details on page 394 on this book.

The first element in the array has a value between 0 and 1. All other elements of the array are 0.0. So the array isn't null. I would expect that for every result is a confidence score in the float array. But that's not true. The size of the results List and the confidence Array are the same. The book you linked, the Google docs and the book of Reto Meier are using exactly this. But it makes no sense, if only one result will have a confidence score. — Steve Benett, Sep 27 '13 at 10:13

score 1 · Answer 3 · answered Dec 09 '15 at 17:09

1

The conventional speech recognition algorithm allows to return confidence of just 1-best result because it is the result compared with other results to calculate the confidence. It is also possible to return N best results instead of just 1-best, however, it is much harder to calculate confidence for them.

It seems that Google implemented the conventional approach only and reserved place in the API for more detailed results with n-best confidence.

You just have to wait for Google to implement everything properly.

answered Dec 09 '15 at 17:09

Nikolay Shmyrev

24,897
5
43
87

No offence, but what's your source of this info? It makes sense, that the array is just acting as a placeholder. But I would like to have a proof. – Steve Benett Dec 09 '15 at 22:31
If you check most modern algorithms for confidence scoring like consensus decoding http://arxiv.org/abs/cs/0010012 or minimum bayes risk decoding http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/34628.pdf, they all are 1-to-everything else, which means that they estimate the probability of hypothesis against all other possible outcomes, not n-best hypothesis. N-best is calculated with a separate algorithm and different outcomes are not compared with each other, just compared with everything else. – Nikolay Shmyrev Dec 10 '15 at 11:39
This is natural approach because you can not really rank hypothesis results in the space of all possible outcomes, this space is just too large and it might not fit your expectation on what confidence is. If you look on open source toolkits like SRILM or Kaldi, they also provide 1-best confidence or n-best results from lattice, but never both. – Nikolay Shmyrev Dec 10 '15 at 11:41
May be there is a parameter which switches the Google recognition alogritm to n-best result mode? – Andrey Epifantsev Feb 29 '20 at 06:30

Android SpeechRecognizer "confidence" values are confusing

3 Answers3

Linked