1

I am currently doing a project on speaker verification using Hidden Markov Models. I chose MFCC for my feature extraction. I also intend to apply VQ to it. I have implemented HMM and tested it on Eisner's data spreadsheet found here: http://www.cs.jhu.edu/~jason/papers/ and got correct results.

Using voice signals, I seem to have missed something since I was not getting correct acceptance (I did the probability estimation using the forward algorithm - no scaling applied).I was wondering on what could have I done wrong. I used scikits talkbox's MFCC function for feature extraction and used Scipy's cluster for vector quantization. Here is what I have written:

from scikits.talkbox.features import mfcc
from scikits.audiolab import wavread
from scipy.cluster.vq import vq, kmeans, whiten

(data, fs) = wavread(file_name)[:2]
mfcc_features = mfcc(data, fs=fs)[0]

#Vector Quantization
#collected_feats is a list of spectral vectors taken together from 3 voice samples
random.seed(0)
collected_feats = whiten(collected_feats)
codebook = kmeans(collected_feats, no_clusters)[0]


feature = vq(mfcc_features, codebook)

#feature is then used as the observation for the hidden markov model

I assumed that the default parameters for scikits' mfcc function is already fit for speaker verification. The audio files are of sampling rates 8000 and 22050. Is there something I am lacking here? I chose a cluster of 64 for VQ. Each sample is an isolated word. at least 1 second in duration. I haven't found a Python function yet to remove the silences in the voice samples so I use Audacity to manually truncate the silence parts. Any help would be appreciated. Thanks!

Bobby
  • 31
  • 5

1 Answers1

-1

Well I am not sure about HMM approach but I would recommend using GMM. ALize is a great library for doing that. For Silence removal, use the LIUM library. The process is called speaker diarization, the program detects where the speaker is speaking and gives the time stamp.