Defining a matrix norm to compare two MFCC matrices

Question

I would like to give a clear picture of the problem I am facing.

Scenario Construction:

I have a MFCC generator block which gets the speech samples from the user and generates a rectangular matrix say A of the order m x n ,whose elements are the Cesptral Coefficients(MFCC). Now, suppose I maintain a database which are previously stored containing the user speech signal. Through an LPC filter I generate the speech sample and then direct it to the MFCC generator block, with a constraint that I don't give the entire samples from the database for the filter to generate the speech signal. Rather I give a part of the speech sample. Now this predicted speech signal will be now directed towards the MFCC generator block to generate the predicted signal's Cepstral coefficients which again turns out to be another rectangular matrix say B of the same order m x n. Then I use a matrix norm along with a heuristically chosen threshold to compare two matrices(find the error) and authenticate the user. If it fails, the input speech samples for the prediction is linearly increased and again the constraint is checked.

An insight into matrix A,B as defined before.

The rows of the matrix represents the number of coefficients to be generated per speech frame. The columns are the concatenation of coefficients of all the frames for the entire speech sample. A and B have the same setting. ( During MFCC generation we make use of a window of fixed size, operate on the samples under the window which yields coefficients for MFCC for that frame and then slide the window such that the slide steps are less than the window size, ie: every successive windows overlap).

Question:

I have seen this Matching two series of Mfcc coefficients link. I found it somewhat useful. Yet, I have a few concerns to raise in regards to the problem that I just defined. Even when an authenticated user speak ( utters the exact word that is stored in the database) it is not necessary that MFCC ( positions of each element in the matrix) should be in exact match with the one that is generated during prediction.If both the rectangular matrix is converted into a vector, there may be a time delay between the samples. If so, the norms defined in the link I mentioned needn't work out even for an authenticated user. How do I fix it? Is there any other forms of solving the above problem.

Thank you.

hey, are you still looking for an answer to this question? If so, have you tried/do you know about the Mahalanobis distance formula? It's the one that I always use for speech recognition, but this sounds like speaker recognition, so I'm not sure if the same principle applies. Also something to take into consideration is that Mahalanobis distance requires several training samples to be taken, rather than just working from one. — rurouniwallace, Jul 19 '12 at 20:13

Defining a matrix norm to compare two MFCC matrices

0 Answers0