2

I am new to machine learning domain. Currently, I am trying to implement a audio language detection system, based on MFCC, delta, delta delta and Mel Spectrum Coefficients of any audio file. These features are extracted using librosa. Librosa returns a 2D matrix of MFCC's. The problem is that I want to train them on a Gaussian Mixture Model. The Sci-kit library takes the input in the format (n_samples, n_features), but I have a D matrix of the form (n_samples, n_mfcc, n_time) as returned by librosa.features.mfcc(). How can i provide a 3D input to a GMM?

Also is there a way so that I can send all the 4 features mentioned above into the model?

Amit K.S
  • 21
  • 2
  • I think you should provide a [n_samples x n_mfcc] matrix for each n_time. –  Nov 16 '18 at 18:25

0 Answers0