I am using Sphinx4 to do alignment with a text. I want to get the timing for each word in the sentence (start, end) and also the timing of each phoneme in the words. To do this I changed the code of the SpeechAligner. The method I edited is:
public List<WordResult> align(URL audioUrl, List<String> sentenceTranscript) throws IOException {...}
I just added a list where I get the result in Result class (not WordResult).
List<WordResult> hypothesis = new ArrayList<WordResult>();
Result result;
while (null != (result = recognizer.recognize())) {
alignResult.add(result);// I am filling the results here
logger.info("Utterance result " + result.getTimedBestResult(true));
hypothesis.addAll(result.getTimedBestResult(false));
}
Then I followed exactly this example: Phonemes Timestamp
For this sentence : " des adversaires" I am expecting to have: expected result
But the result is shifted 1 word to the beginning the takes the spelling of the word "des", and des takes the spelling of "adversaires" and so on (as if the second silence is ignored). I am getting this: what i get
to display the token and the units I use:
System.out.println("token : " + token.getWordPath() + " - unit : " + unit.toString());
Thanks in advance,