I am currently doing a project on speaker verification using Hidden Markov Models. I chose MFCC for my feature extraction. I also intend to apply VQ to it. I have implemented HMM and tested it on Eisner's data spreadsheet found here: http://www.cs.jhu.edu/~jason/papers/ and got correct results.
Using voice signals, I seem to have missed something since I was not getting correct acceptance (I did the probability estimation using the forward algorithm - no scaling applied).I was wondering on what could have I done wrong. I used scikits talkbox's MFCC function for feature extraction and used Scipy's cluster for vector quantization. Here is what I have written:
from scikits.talkbox.features import mfcc
from scikits.audiolab import wavread
from scipy.cluster.vq import vq, kmeans, whiten
(data, fs) = wavread(file_name)[:2]
mfcc_features = mfcc(data, fs=fs)[0]
#Vector Quantization
#collected_feats is a list of spectral vectors taken together from 3 voice samples
random.seed(0)
collected_feats = whiten(collected_feats)
codebook = kmeans(collected_feats, no_clusters)[0]
feature = vq(mfcc_features, codebook)
#feature is then used as the observation for the hidden markov model
I assumed that the default parameters for scikits' mfcc function is already fit for speaker verification. The audio files are of sampling rates 8000 and 22050. Is there something I am lacking here? I chose a cluster of 64 for VQ. Each sample is an isolated word. at least 1 second in duration. I haven't found a Python function yet to remove the silences in the voice samples so I use Audacity to manually truncate the silence parts. Any help would be appreciated. Thanks!