0

In CMU Sphinx(Sphinx-4) for speaker adaptation technique, I am using following code snippet

Stats stats = recognizer.createStats(nrOfClusters);
recognizer.startRecognition(stream);
while ((result = recognizer.getResult()) != null) {
    stats.collect(result);
}
recognizer.stopRecognition();

// Transform represents the speech profile
Transform transform = stats.createTransform();
recognizer.setTransform(transform);

what should be nrOfClusters(number of clusters) parameter value to get good results? How can we use this snippet to adapt to multiple speakers in audio?

rishi007bansod
  • 1,283
  • 2
  • 19
  • 45

1 Answers1

0

What should be nrOfClusters(number of clusters) parameter value to get good results?

Number of clusters depend on amount of data for adaptation. The more data you have, the more clusters you can use. For example, if you have 30 seconds of speech, 1 cluster is enough. If you have 10 minutes of speech you can use up to 32 clusters.

How can we use this snippet to adapt to multiple speakers in audio?

If you know times for each speaker you can run adaptation for each speaker separately. There is no much sense to create a shared transform for different speakers.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • Nikolay thanks for reply, I have some more questions regarding Sphinx-4. Where can I read more about speaker adaptation with MLLR transformation implementation in Sphinx-4? What other techniques are there to improve accuracy of speech recognition in Sphinx-4? Can we update existing Sphinx-4 language model in runtime to get more accuracy? Also Sphinx-4 shows 3xRT speed, so how can we improve it to get real time speed-up, is there any parallel implementation of Sphinx-4? – rishi007bansod Sep 01 '16 at 04:51