I am working on a project that requires extracting MFCC features from an audio stream. The project consists primarily of classification, although in the interest of expanding our dataset I am working on a detection algorithm to isolate the parts of the sound we are interested in classifying.
I am testing out different representations and due to the nature of the data (I wish I could give more details but the professor I am working with would prefer to keep it private I am fairly sure), I would imagine delta coefficients on top of the MFCC coefficients would be helpful.
I am extracting 40 MFCC Coefficients along with 40 Delta coefficients and using those for detection. I have a set of training data that consists of a 40 millisecond window centered around the parts of he audio stream I am interested in. I am then training a GMM on that data.
For testing (and its actual use case) I split a longer audio stream (2 seconds or so) into a sequence of MFCC frames. I extract the log likelihood for each frame and threshold the detection based on the percentiles within a log likelihood score, and I get strange results when delta coefficients are used.
You can ignore the 4 figures on the bottom, those were just for visualizing my threshold scheme.
What I want to know is why does the log likelihood behave so strangely when using delta coefficients compared to when no deltas are used?
Thank you in advance, if you need clarifications please ask.