0

I have written the speech recognition application using CMU sphinx 4 and followed the details from this link. I have defined the Acoustic,Dictionary and Language Model as below

configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");

configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");

configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");

With the above configuration the 20 minutes of wav file takes almost close to 20 minutes to do the transcription.Hence than I tried to pass the user defined config.xml. I did n't find the configuration manager option to pass the user defined config.xml with the current version of Sphinx4.Then I had written own recognizer by extending the AbstractSpeechRecognizer.java class(It may be useless) and changed few parameters of config.xml and I tried it but still no improvement.

I have downloaded video and audio across multiple source and converted into WAV file using FFMPEG

The command is as below

ffmpeg -i input.mp3 -acodec pcm_s16le -ac 1 -ar 16000 output.wav

Environment Details:

Java 8

Ubuntu 14.04

RAM 4GB

I5 Processor

What I would like to know is, what I am missing here and how to improve the performance?

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
vijaym
  • 41
  • 8

1 Answers1

1

Speech recognition is resource-intensive process. Accurate speech recognition is expected to be slow, your current speed 1xRT (1 minute takes 1 minute to decode) is reasonable. There are commercial products which use speedup from GPU which can run at 0.05xRT, but on CPU you usually run not faster than 0.2xRT. So you still have to spend time on decoding.

If you want to process the file faster you can split it on parts and decode each part separately in parallel threads or on parallel machines.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • So there is no use of changing the configuration. Xml right and what case do I need to change the acoustic model(just for pronunciation)? – vijaym Sep 22 '15 at 12:25
  • No, there is use to change XML, we do not recommend it. I am not sure what do you mean "for pronunciation" – Nikolay Shmyrev Sep 22 '15 at 13:32