I am trying to recognize words from cropped images of words itself by training a CRNN(CNN+LSTM+CTC) model. I am confused how to add confidence score along with recognized words. I am uisng tensorflow and following the implementation of https://github.com/TJCVRS/CRNN_Tensorflow. Can some one suggest me how to modify the connectionist Temporal Classification (CTC) layer of the network to also give us a confidence score?
2 Answers
there are two solutions I can think of right now:
- both TensorFlow decoders provide information about the score of the recognized text. ctc_greedy_decoder returns neg_sum_logits which contains a score for each batch element. The same is true for ctc_beam_search_decoder, which returns log_probabilities which contains the scores for each beam of each batch element.
- take the recognized text from any of the two decoders. Put another CTC loss function into your code and feed the RNN output matrix and the recognized text into the loss function. The result will then be the probability (ok, you have to undo the minus and the log, but that should be easy) of seeing the given text in the matrix.
Solution (1) is faster and more simple to implement, however, solution (2) is more accurate. But the difference should not be too large as long as the CRNN is well trained and the beam width of the beam search decoder is large enough.
Look into the TF-CRNN code at the following line - the score is already returned as the variable log_prob: https://github.com/MaybeShewill-CV/CRNN_Tensorflow/blob/master/tools/train_shadownet.py#L62
And here is a self-contained code sample which illustrates solution (2): https://gist.github.com/githubharald/8b6f3d489fc014b0faccbae8542060dc

- 1,105
- 8
- 20
-
so you mean to say log_prob is the confidence score for the recognized word? – vinayak A Jun 07 '18 at 11:12
-
it is the (approximated) probability of the recognized word ... if this is what you call "confidence score", then yes. (as already said ... take care of the log in case you need probability values between 0..1) – Harry Jun 07 '18 at 11:24
-
sorry for asking this dumb question, what should i do to get the probability values between 0-1 from the log_prob. exponential(log_prob)? – vinayak A Jun 07 '18 at 14:24
-
yes, for ctc_beam_search_decoder that should do it. In case log_prob values are greater than 0, you first have to multiply by -1, which seems to be the case for ctc_greedy_decoder according to the documentation. – Harry Jun 07 '18 at 14:53
-
okay, i use ctc_beam_search_decoder, so lower the log_prob higher the accuracy right? because exponential (0) is 1, or should i multipy log_prob by -1 before finding the exponential – vinayak A Jun 07 '18 at 18:18
-
No. Have a look at how the log function looks like, e.g. type log(x) into google, it will show the graph. The higher the probability (x-axis), the higher the log-probability (y-axis), and vice versa. You can brute force the right approach: apply exp(log_prob) and if the values are not between 0 and 1 then apply exp(-log_prob) instead. – Harry Jun 07 '18 at 18:50
One update from myself:
i finally achieved a score by, passing the predicted label back to the ctc loss function and taking the anti-log of the negative of the resulting loss. I am finding this value very accurate than taking the anti-log of log_prob.

- 77
- 2
- 9