I am trying to convert output of CMU Sphinx's recognizer (i.e. list < hypothesis (i.e. phrase), score (in log) > obtained by tweaking test_ps_nbest.c) to following form: list < hypothesis (i.e. phrase), "probability" (between 0 and 1) >
A trivial method which I am using now is as follows:
- Divide each confidence score by language weight (eg: 11)
- Normalize the list of confidence score in log domain
- Output probability = exp(normalized confidence score)
The problem is that the output probability from above method is biased. Do you have any suggestions that I can use to get the bias in the probability ?
Example method that I have to implement to correct the bias:
vector < double > getBias(vector < string > phrases, vector < double > logConfidenceScores)
Example input for above discussion:
< "HE GOT IN OUR HEAD HEART LUNG AND HE MARKED IT", -43278 >
< "HE GOT IN OUR AT OUR CLASSES MONEY AND HE MARKED IT", -43449 >
< HE GOT IN POWER AT HEART LUNG AND HE MARKED IT", -43368 >