0

So I'm trying to write a simple program that will detect voice activity with a .wav file using the CMU Sphinx library.

So far, I have the following

SpeechClassifier s = new SpeechClassifier();

s.setPredecessor(dataSource);
Data d = s.getData();

while(d != null) {
    if(s.isSpeech()) {
        System.out.println("Speech is detected");
    }
    else {
        System.out.println("Speech has not been detected");
    }

    System.out.println();
    d = s.getData();
}

I get the output "Speech is not detected" but there is Speech in the audio file. It seems as if the getData function is not working the way I want it to. I want it to get the frames and then determine whether the frames (s.isSpeech()) contain speech or not.

I'm trying to have multiple outputs ("Speech is detected" vs "Speech is not detected") for each frame. How can I make my code better? Thanks!

1 Answers1

1

You need to insert DataBlocker before SpeechClassifier:

 DataBlocker b = new DataBlocker(10); // means 10ms
 SpeechClassifier s = new SpeechClassifier(10, 0.003, 10, 0);
 b.setPredecessor(dataSource);
 s.setPredecessor(b);

Then it will process 10 millisecond frames.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • Thank you! But after adding that, it is now breaking at the last line (d = s.getData()). Am I supposed to get rid of the getData() method altogether? I added the DataBlocker and modified SpeechClassifier like your advised above. – practicemakesperfect Mar 06 '17 at 14:20
  • getData() should stay. What do you mean by "breaking at the last line"? – Nikolay Shmyrev Mar 06 '17 at 21:07
  • I edited my post above. Sorry if my problem seems straightforward. I feel like it should be but I haven't been able to get it to work. – practicemakesperfect Mar 06 '17 at 21:24
  • You need a different constructor `new SpeechClassifier(10, 0.003, 10, 0);` then it will work, see my updated answer. – Nikolay Shmyrev Mar 06 '17 at 22:11
  • Awesome! One more question, when speech is not detected I get the output "speech has not been detected at 1" twice and then "speech has been detected at 2". (So that's the first 3 lines) all of the "no speech detected" is repeated – practicemakesperfect Mar 07 '17 at 18:44