4

I am trying to build a basic Emotion detector from speech using MFCCs, their deltas and delta-deltas. A number of papers talk about getting a good accuracy by training GMMs on these features.

I cannot seem to find a ready made package to do the same. I did play around with scilearn in Python, Voicebox and similar toolkits in Matlab and Rmixmod, stochmod, mclust, mixtools and some other packages in R. What would be the best library to calculate GMMs from trained data?

emesday
  • 6,078
  • 3
  • 29
  • 46
  • what do you mean by best ? you already pointed out some packages to do Gaussian Mixture Modeling in R and there are others here http://cran.r-project.org/web/views/Cluster.html (and please next time when you want to use acronyms define them first !!) – dickoa Mar 15 '13 at 16:34

2 Answers2

2

Challenging problem is training data, which contains the emotion information, embedded in feature set. The same features that encapsulate emotions should be used in the test signal. The testing with GMM will only be good as your universal background model. In my experience typically with GMM you can only separate male female and a few unique speakers. Simply feeding the MFCC’s into GMM would not be sufficient, since GMM does not hold time varying information. Since emotional speech would contain time varying parameters such as pitch and changes in pitch over periods in addition to the frequency variations MFCC parameters. I am not saying it not possible with current state of technology but challenging in a good way.

Lalin
  • 141
  • 1
  • 5
0

If you want to use Python, here is the code in the famous speech recognition toolkit Sphinx.

http://sourceforge.net/p/cmusphinx/code/HEAD/tree/trunk/sphinxtrain/python/cmusphinx/gmm.py

emesday
  • 6,078
  • 3
  • 29
  • 46