1

I am working on a past kaggle competition problem where I am writing a speech recognition algorithm. With Automatic Speech Recognition (ASR) algorithms, it is customary to process the data into MFCCs (Mel Frequency Cepstrel Coefficients).

Using a library by James Lyons (https://github.com/jameslyons/python_speech_features), I built a class called speech_recognize that loads the wav files and then processes them into MFCC.

Now that I have the MFCC, how am I supposed to put them into the HMM (Hidden Markov Model)? I want to use the library hmmlearn found here: (https://hmmlearn.readthedocs.io/en/latest/index.html).

Although I am only working with four categories right now ("yes", "no", silence, 'other'), I could potentially add more categories. I have about 2000 different wav files for each category. These wav files make an MFCC each.

Here is what I was thinking about how to train my data on all of the MFCC. As you can see I am trying to distinguish between 4 states: "yes", "no", silence, and other.

import numpy as np
from hmmlearn import hmm
import speech_recognizer #This is the class I made to turn wav files to MFCC

"""
Variables:

mfcc_list_yes - All of the MFCC of wav files that say "yes"
mfcc_list_no - All of the MFCC of wav files that say "no"
mfcc_list_silence - All of the MFCC of wav files that say "silence"
mfcc_list_other - All of the MFCC of wav files that say something else beside the other categories
"""
mfcc_list_yes = r.mfcc_list 
mfcc_list_no = m.mfcc_list 
mfcc_list_silence = s.mfcc_list 
mfcc_list_other = o.mfcc_list 
mfcc_list = mfcc_list_yes + mfcc_list_yes + mfcc_list_silence + mfcc_list_other

def HMM(mfcc_list = mfcc_list, num_states = 4):
    """
    num_states represents the possible states that the HMM could be in. For a speech recognition algorithm
    this would include: silence, not recognizable, and any number of other states (or possible words)
    """
    model = hmm.GaussianHMM(n_components = num_states, covariance_type = "full", n_iter=100)
    x = 0
    for mfcc in mfcc_list:

        model.fit(mfcc)

HMM()

Q1: In hmmlearn, how am I supposed to get my model to train on all of my wav files when it seems the X in the .fit(X) method only takes a sequence of floats? Each of my MFCC files are sized as (99,26) numpy array and not a simple sequence, yet this one (99,26) numpy array is only one observation of the data.

Q2: I know that hidden markov models work by analyzing sequences and transitions from one state into another, but how am I supposed to tell the HMM that it is looking at a "yes" MFCC or a "no" file. Is there a way of labeling the files so that when it encounters the MFCC pattern the computer can classify what it is?

Q3: Based on the plots of the MFCC, I'm tempted in using a CNN (Convolutional Neural Network) to classify these shapes into different categories. Has that worked for anyone?

Thank you. I know this is long, but I hope this makes sense.

Here are some photos of my MFCC plots: MFCC filterbank for four wav files MFCC filterbank for four wav files

0 Answers0