Using cmusphinx for text alignment in practice, how do I improve the recognition success rate?

Question

I'm using cmusphinx for text alignment. I downloaded the latest sphinx4, build a text aligner by modifying one of the demo using the WSJ acoustic models and dictionaries that comes along with the code. It does work occacionally but for lots of quite good pronunciation aligning simple text it just fails.

What would be the reason? Is it the language models I use is too limited and I should be downloading more model data to feed the recogniser? Is there any good prepackaged sphinx distribution that saves me from testing with different language models and configuring the software?

And thanks a lot :)

Here's the codes I think that'd matters,

byte[] bytes = readContentOfAOggFile();
ByteArrayInputStream inputStream = new ByteArrayInputStream(bytes);

grammar = (ResetableTextAlignGrammar) cm.lookup("textAlignGrammar");
grammar.setTextAfterAllocation(referenceText);


AudioInputStream ai = AudioSystem.getAudioInputStream(inputStream);
dataSource.setInputStream(ai, null);
dataSource = (AudioFileDataSource) cm.lookup("audioFileDataSource");
dataSource.setInputStream(stream, null);

result = recognizer.recognize();

Please note that this code works for half single word sentences.

score 0 · Answer 1 · answered Apr 12 '14 at 06:17

0

What would be the reason?

You need to share the data you are trying to get an answer on that

Is it the language models I use is too limited and I should be downloading more model data to feed the recognizer?

Unlikely

Is there any good prepackaged sphinx distribution that saves me from testing with different language models and configuring the software?

Once you share your test data, it's easier to say what is going on there.

answered Apr 12 '14 at 06:17

Nikolay Shmyrev

24,897
5
43
87

Could you take a look at my data files here: https://www.dropbox.com/sh/dw9qvk9d4m1s32q/pEpGsPPwki – tactoth Apr 13 '14 at 03:53
It's all 16k mono sound files. – tactoth Apr 13 '14 at 04:08
The files shared are ogg, now wav. What is the text you align to? – Nikolay Shmyrev Apr 13 '14 at 07:19
The text is the file name. Since sphinx use java sound using ogg would be fine if you've included a ogg decoder in build path. I've included and it was working for quite some cases. – tactoth Apr 14 '14 at 12:45
@Ninolay Shmyrev I used jdogg for ogg decoding. – tactoth Apr 14 '14 at 12:47
How exactly did you decode ogg? Alignment depends on that. – Nikolay Shmyrev Apr 14 '14 at 13:18
Please see my code sample in the question I've edited. – tactoth Apr 15 '14 at 15:19

Using cmusphinx for text alignment in practice, how do I improve the recognition success rate?

1 Answers1