1

I am using pocketsphinx to recognize words in android application when someone speak. I want to implement functionality to return a max amplitude of a voice which pocketsphinx record. If i speak any word i need to get a sound level in return(Either word recognize or not from decoder). What i have done: I look into the code of pocketsphinx-->SpeechRecognizer currently commented in source file:

 /*              while (!interrupted()
                && ((timeoutSamples == NO_TIMEOUT) || (remainingSamples > 0))) {
            int nread = recorder.read(buffer, 0, buffer.length);

            if (-1 == nread) {
                throw new RuntimeException("error reading audio buffer");
            } else if (nread > 0) {
                decoder.processRaw(buffer, nread, false, false);

                int max = 0;
                for (int i = 0; i < nread; i++) {
                    max = Math.max(max, Math.abs(buffer[i]));
                }....

I seems to be that this max value is calculating from a buffer How i can calculate this from complete recording. Can someone give me hint?

1 Answers1

0

That would be it, just make max a field of SpeechRecognizer class and do not initialize it every time, but only on start recognition:

class SpeechRecognizer() {

     double maxLevel;

     void startRecognition() {
         maxLevel = 0.0;
     }

     ....         
        @Override
        public void run() {
                decoder.processRaw(buffer, nread, false, false);

                double level = 0;
                for (int i = 0; i < nread; i++) {
                    level += buffer[i] * buffer[i];
                }
                level = sqrt(level / nread);
                if (maxLevel < level)
                    maxLevel = level
      ....

}

Here I recommend you to use root mean square (RMS) instead of simply max because it is more stable estimate for a maximum amplitude, it is resistant to simple bursts of amplitude like clicks.

It is also a good idea to return RMS from every buffer with the result and update maxLevel inside the application, not within recognizer.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • This is how i used to calculate a noise level from speech recognizer class and send back to my service (unable to formate below code): double sum = 0, amplitudeDb = 0; for (int i = 0; i < nread; i++) { sum += buffer[i] * buffer[i]; } if (nread > 0) { final double amplitude = sum / nread; amplitudeDb = (int) Math.sqrt(amplitude); } mainHandler.post(new AmplitudeEvent(amplitudeDb)); –  Dec 09 '15 at 11:50