1

With MFCC features as input data (Numpy array of (20X56829)), by applying HMM trying to create audio vocabulary from decoded states of HMM.I have 10 speakers in the MFCC features. I need 50 states per speaker. So I used N = 500 states and it throws Memory error, but it works fine with N =100 states.

Here is the code:

import numpy as np
from hmmlearn import hmm
import librosa

def getMFCC(episode):

    filename = getPathToGroundtruth(episode)

    y, sr = librosa.load(filename)  # Y gives 

    data = librosa.feature.mfcc(y=y, sr=sr)

    return data

def hmm_init(n,data):  #n = states d = no of feautures

    states =[]

    model = hmm.GaussianHMM(n_components=N, covariance_type="full")

    model.transmat_ = np.ones((N, N)) / N

    model.startprob_ = np.ones(N) / N

    fit = model.fit(data.T)

    z=fit.decode(data.T,algorithm='viterbi')[1]

    states.append(z)

    return states

data_m = getMFCC(1)  # Provides MFCC features of numpy array [20 X 56829]

N = 500

D= len(data)

states = hmm_init(N,data)

In [23]: run Final_hmm.py
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
/home/elancheliyan/Final_hmm.py in <module>()
     73 D= len(data)
     74 
---> 75 states = hmm_init(N,data)
     76 states.dump("states")
     77 

/home/elancheliyan/Final_hmm.py in hmm_init(n, data)
     57     model.startprob_ = np.ones(N) / N
     58 
---> 59     fit = model.fit(data.T)
     60 
     61     z=fit.decode(data.T,algorithm='viterbi')[1]

/cal/homes/elancheliyan/.local/lib/python3.5/site-packages/hmmlearn-0.2.1-py3.5-linux-x86_64.egg/hmmlearn/base.py in fit(self, X, lengths)
    434                 self._accumulate_sufficient_statistics(
    435                     stats, X[i:j], framelogprob, posteriors, fwdlattice,
--> 436                     bwdlattice)
    437 
    438             # XXX must be before convergence check, because otherwise

/cal/homes/elancheliyan/.local/lib/python3.5/site-packages/hmmlearn-0.2.1-py3.5-linux-x86_64.egg/hmmlearn/hmm.py in _accumulate_sufficient_statistics(self, stats, obs, framelogprob, posteriors, fwdlattice, bwdlattice)
    221                                           posteriors, fwdlattice, bwdlattice):
    222         super(GaussianHMM, self)._accumulate_sufficient_statistics(
--> 223             stats, obs, framelogprob, posteriors, fwdlattice, bwdlattice)
    224 
    225         if 'm' in self.params or 'c' in self.params:

/cal/homes/elancheliyan/.local/lib/python3.5/site-packages/hmmlearn-0.2.1-py3.5-linux-x86_64.egg/hmmlearn/base.py in _accumulate_sufficient_statistics(self, stats, X, framelogprob, posteriors, fwdlattice, bwdlattice)
    620                 return
    621 
--> 622             lneta = np.zeros((n_samples - 1, n_components, n_components))
    623             _hmmc._compute_lneta(n_samples, n_components, fwdlattice,
    624                                  log_mask_zero(self.transmat_),

MemoryError:

Is there anything wrong in my initialization ?

Rangooski
  • 825
  • 1
  • 11
  • 29
  • As we've [discussed on GitHub](https://github.com/hmmlearn/hmmlearn/issues/131), nothing is wrong with your code. The issue is in the memory requirements of the classical Baum-Welch algorithm – Sergei Lebedev Jun 30 '16 at 15:44
  • Last year, people used sklearn.hmm to achieve this. But I am not able to do. I used to Pomengranate and it shows singular value matrices. Is there any efficient implementation that i can follow for higher order states ? – Rangooski Jun 30 '16 at 16:04
  • Can you link me to the code from last year? You can install an older version of scikit-learn (which comes with the hmm module) and run your code against it. Otherwise, you can split your observations into multiple sequences to reduce memory consumption. – Sergei Lebedev Jun 30 '16 at 16:44
  • Thanks I trying to reduce the states as well trying to split data. With the last year code . Last year their input data is very small . So it worked with them. – Rangooski Jul 01 '16 at 14:11
  • But I am afraid that splitting the data will not give the same results. because for every piece of spliced data it might use the new initialization instead of getting the past observed initialization. – Rangooski Jul 01 '16 at 15:02

0 Answers0