I'm using cmusphinx for text alignment. I downloaded the latest sphinx4, build a text aligner by modifying one of the demo using the WSJ acoustic models and dictionaries that comes along with the code. It does work occacionally but for lots of quite good pronunciation aligning simple text it just fails.
What would be the reason? Is it the language models I use is too limited and I should be downloading more model data to feed the recogniser? Is there any good prepackaged sphinx distribution that saves me from testing with different language models and configuring the software?
And thanks a lot :)
Here's the codes I think that'd matters,
byte[] bytes = readContentOfAOggFile();
ByteArrayInputStream inputStream = new ByteArrayInputStream(bytes);
grammar = (ResetableTextAlignGrammar) cm.lookup("textAlignGrammar");
grammar.setTextAfterAllocation(referenceText);
AudioInputStream ai = AudioSystem.getAudioInputStream(inputStream);
dataSource.setInputStream(ai, null);
dataSource = (AudioFileDataSource) cm.lookup("audioFileDataSource");
dataSource.setInputStream(stream, null);
result = recognizer.recognize();
Please note that this code works for half single word sentences.