I am reading this paper by Mark Gales and Steve Young on speech recognition using HMM-GMM. In page 205, second paragraph, it is written:
"For each utterance Y(r) , r = 1, . . . , R, of length T(r) the sequence of baseforms, the HMMs that correspond to the word-sequence in the utterance, is found and the corresponding composite HMM constructed"
I did not clearly understand what is Y(r) and Tsup>(r) ? Can someone clarify it ? I did not understand what does r and R stands for ?
Similarly in this paper titled as : A Parallel Implementation of Viterbi Training for Acoustic Models using Graphics Processing Units, in section 2.1 the author mentions that :
"Given a set of training observations Osup>(r) , 1 ≤ r ≤ R and an HMM state sequence 1 < j < N the observations sequence is aligned to the state sequence via Viterbi alignment."
I know both sentences are similar but in above paper as well I did not understand what is r and R.