I am working on a past kaggle competition problem where I am writing a speech recognition algorithm. With Automatic Speech Recognition (ASR) algorithms, it is customary to process the data into MFCCs (Mel Frequency Cepstrel Coefficients).
Using a library by James Lyons (https://github.com/jameslyons/python_speech_features), I built a class called speech_recognize that loads the wav files and then processes them into MFCC.
Now that I have the MFCC, how am I supposed to put them into the HMM (Hidden Markov Model)? I want to use the library hmmlearn found here: (https://hmmlearn.readthedocs.io/en/latest/index.html).
Although I am only working with four categories right now ("yes", "no", silence, 'other'), I could potentially add more categories. I have about 2000 different wav files for each category. These wav files make an MFCC each.
Here is what I was thinking about how to train my data on all of the MFCC. As you can see I am trying to distinguish between 4 states: "yes", "no", silence, and other.
import numpy as np
from hmmlearn import hmm
import speech_recognizer #This is the class I made to turn wav files to MFCC
"""
Variables:
mfcc_list_yes - All of the MFCC of wav files that say "yes"
mfcc_list_no - All of the MFCC of wav files that say "no"
mfcc_list_silence - All of the MFCC of wav files that say "silence"
mfcc_list_other - All of the MFCC of wav files that say something else beside the other categories
"""
mfcc_list_yes = r.mfcc_list
mfcc_list_no = m.mfcc_list
mfcc_list_silence = s.mfcc_list
mfcc_list_other = o.mfcc_list
mfcc_list = mfcc_list_yes + mfcc_list_yes + mfcc_list_silence + mfcc_list_other
def HMM(mfcc_list = mfcc_list, num_states = 4):
"""
num_states represents the possible states that the HMM could be in. For a speech recognition algorithm
this would include: silence, not recognizable, and any number of other states (or possible words)
"""
model = hmm.GaussianHMM(n_components = num_states, covariance_type = "full", n_iter=100)
x = 0
for mfcc in mfcc_list:
model.fit(mfcc)
HMM()
Q1: In hmmlearn, how am I supposed to get my model to train on all of my wav files when it seems the X
in the .fit(X)
method only takes a sequence of floats? Each of my MFCC files are sized as (99,26) numpy array and not a simple sequence, yet this one (99,26) numpy array is only one observation of the data.
Q2: I know that hidden markov models work by analyzing sequences and transitions from one state into another, but how am I supposed to tell the HMM that it is looking at a "yes" MFCC or a "no" file. Is there a way of labeling the files so that when it encounters the MFCC pattern the computer can classify what it is?
Q3: Based on the plots of the MFCC, I'm tempted in using a CNN (Convolutional Neural Network) to classify these shapes into different categories. Has that worked for anyone?
Thank you. I know this is long, but I hope this makes sense.
Here are some photos of my MFCC plots: MFCC filterbank for four wav files MFCC filterbank for four wav files