0

I am using Sphinx4 to do alignment with a text. I want to get the timing for each word in the sentence (start, end) and also the timing of each phoneme in the words. To do this I changed the code of the SpeechAligner. The method I edited is:

public List<WordResult> align(URL audioUrl, List<String> sentenceTranscript) throws IOException {...}

I just added a list where I get the result in Result class (not WordResult).

List<WordResult> hypothesis = new ArrayList<WordResult>();
            Result result;
            while (null != (result = recognizer.recognize())) {

                alignResult.add(result);// I am filling the results here

                logger.info("Utterance result " + result.getTimedBestResult(true));
                hypothesis.addAll(result.getTimedBestResult(false));
            }

Then I followed exactly this example: Phonemes Timestamp

For this sentence : " des adversaires" I am expecting to have: expected result

But the result is shifted 1 word to the beginning the takes the spelling of the word "des", and des takes the spelling of "adversaires" and so on (as if the second silence is ignored). I am getting this: what i get

to display the token and the units I use:

System.out.println("token : " + token.getWordPath() + " - unit : " + unit.toString());

Thanks in advance,

user1828433
  • 252
  • 2
  • 11
  • @NikolayShmyrev Unfortunately changing true to false doesn't change anything ( anyway logger.info("Utterance result " + result.getTimedBestResult(true));) is just a log line. – user1828433 Aug 24 '16 at 22:24

1 Answers1

0

There are two types of linguists in sphinx4 - FlatLingust appends unit tokens before actual phoneme detectors, lextree linguist appends them after. There is a case to handle that in Result class in sphinx4:

   if (wordTokenFirst) {
        return getTimedWordPath(token, withFillers);
    } else {
        return getTimedWordTokenLastPath(token, withFillers);
    }

The code on the wiki page is provided for lextree linguist with unit tokens after detector tokens. The aligner uses FlatLinguist with unit tokens before. So you have to refactor the sample code from the wiki accordingly. It's not very trivial chagne.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87