0

I'm looking into solutions for a voice dependent speech recognition solution for an embedded device. I have looked upon pocketsphinx but because I'm still unfamiliar with it I thought that maybe someone more experienced might know. Is it possible using pocketsphinx to implement such a speech recognition. Rather than using an acoustic and language model it should record the audio, extract its features and then match it with whatever is spoken. Is it possible to implement this flow using pocketsphinx? If it is not can someone point me to right direction for such a solution? Thank you.

Luke Girvin
  • 13,221
  • 9
  • 64
  • 84
Ray
  • 339
  • 3
  • 12

1 Answers1

1

Is it possible using pocketsphinx to implement such a speech recognition.

There is no such functionality in pocketsphinx API

What you can do is to use sphinxbase to extract MFC coefficients first, see sphinx_fe source for example.

Then you can apply DTW algorithm to compare recording. DTW implementation is very simple, it's just 50 lines of code:

http://en.wikipedia.org/wiki/Dynamic_time_warping

There are few libraries which implement DTW as well, you can find the links on the wikipedia page.

It would be great to see a pocketsphinx patch demonstrating DTW implementation.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87