0

I want to build speech recognizer system for dictation like application. I read htk book and other tutorials but all the tutorials are for command and control like applications. For those applications, set of commands, words limited and it is manually specified using task grammar (gram file).

In my application it is not possible to specify such grammar as I will be processing huge audio files containing conversation between two people.

So I would like to know whether it is possible to build such an application using htk.

Thanks...


Update after spending many sleep less nights

I got 86% accuracy using Sphinx. There was some problem with language model (I do not know exactly what was wrong with it, I am trying to find it out) so I created new language model using Sphinx lmtool which is a web based language model generation service. You can get it using this link

Also, I have changed acoustic model from HUB to WSJ.

Shekhar
  • 11,438
  • 36
  • 130
  • 186

1 Answers1

2

Yes, you can. There are two decoders for that purpose:

ATK

and

Julius

Both require you to provide a language model for the large vocabulary speech recognition

I also suggest you to look at CMUSphinx which is somewhat easier to use

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • Hi Nikolay, Thanks for your answer. Actually for few weeks I was using Sphinx only but I was not able to get expected accuracy using Sphinx. So along with Sphinx, I am trying out this htk toolkit. – Shekhar Mar 12 '13 at 03:17
  • 1
    It's unlikely you will get an accuracy with HTK either. THe better idea would be to make adaptation work. – Nikolay Shmyrev Mar 12 '13 at 05:28
  • Okay. I had posted question related to my adaptation steps on Sphinx forum and you had suggested to use other language modelling toolkit like mitlm and SRILM. – Shekhar Mar 12 '13 at 05:37
  • Why did you ignore the suggestion then? – Nikolay Shmyrev Mar 12 '13 at 05:37
  • I have not ignored suggestion. I am trying to use mitlm toolkit and side by side learning how to use htk also. – Shekhar Mar 12 '13 at 05:54
  • 1
    I got 86% accuracy using Sphinx adaptation steps. There was some problem with language model. So I created Sphinx lmtool to create language model and then I got this 86% accuracy. Thanks a lot for your valuable answers and patience. – Shekhar Mar 15 '13 at 09:05
  • 1
    Great, I think you can get even more. Enjoy. – Nikolay Shmyrev Mar 15 '13 at 17:16