0

I'm trying to make a prototype audio recognition system by following this link: http://www.ifp.illinois.edu/~minhdo/teaching/speaker_recognition/. It is quite straightforward so there is almost nothing to worry about. But my problem is with the mel-frequency function. Here is the code as provided on the website:

function m = melfb(p, n, fs)
% MELFB         Determine matrix for a mel-spaced filterbank
%
% Inputs:       p   number of filters in filterbank
%               n   length of fft
%               fs  sample rate in Hz
%
% Outputs:      x   a (sparse) matrix containing the filterbank amplitudes
%                   size(x) = [p, 1+floor(n/2)]
%
% Usage:        For example, to compute the mel-scale spectrum of a
%               colum-vector signal s, with length n and sample rate fs:
%
%               f = fft(s);
%               m = melfb(p, n, fs);
%               n2 = 1 + floor(n/2);
%               z = m * abs(f(1:n2)).^2;
%
%               z would contain p samples of the desired mel-scale spectrum
%
%               To plot filterbanks e.g.:
%
%               plot(linspace(0, (12500/2), 129), melfb(20, 256, 12500)'),
%               title('Mel-spaced filterbank'), xlabel('Frequency (Hz)');

f0 = 700 / fs;
fn2 = floor(n/2);

lr = log(1 + 0.5/f0) / (p+1);

% convert to fft bin numbers with 0 for DC term
bl = n * (f0 * (exp([0 1 p p+1] * lr) - 1));

b1 = floor(bl(1)) + 1;
b2 = ceil(bl(2));
b3 = floor(bl(3));
b4 = min(fn2, ceil(bl(4))) - 1;

pf = log(1 + (b1:b4)/n/f0) / lr;
fp = floor(pf);
pm = pf - fp;

r = [fp(b2:b4) 1+fp(1:b3)];
c = [b2:b4 1:b3] + 1;
v = 2 * [1-pm(b2:b4) pm(1:b3)];

m = sparse(r, c, v, p, 1+fn2);

But it gave me an error:

Error using * Inner matrix dimensions must agree.

Error in MFFC (line 17) z = m * abs(f(1:n2)).^2;

When I include these 2 lines just before line 17:

size(m)
size(abs(f(1:n2)).^2)

It gave me :

ans =

    20    65


ans =

     1    65

So should I transpose the second matrix? Or should I interpret this as an row-wise multiplication and modify the code?

Edit: Here is the main function (I simply run MFCC()):

function result = MFFC()
[y Fs] = audioread('s1.wav');
% sound(y,Fs)

Frames = Frame_Blocking(y,128);
Windowed = Windowing(Frames);
spectrum = FFT_After_Windowing(Windowed);
%imagesc(mag2db(abs(spectrum)))

p = 20;
S = size(spectrum);
n = S(2);

f = spectrum;
m = melfb(p, n, Fs);
n2 = 1 + floor(n/2);
size(m)
size(abs(f(1:n2)).^2)
z = m * abs(f(1:n2)).^2;

result = z;

And here are the auxiliary functions:

function f = Frame_Blocking(y,N)
% Parameters: M = 100, N = 256
% Default : M = 100; N = 256;
M = fix(N/3);

Frames = [];
first = 1; last = N;
len = length(y);
while last <= len
    Frames = [Frames; y(first:last)'];
    first = first + M;
    last  = last + M;
end;
if last < len
    first = first + M;
    Frames = [Frames; y(first : len)];
end
f = Frames;

function f = Windowing(Frames)
S = size(Frames);
N = S(2);
M = S(1);
Windowed = zeros(M,N);
nn = 1:N;
wn = 0.54 - 0.46*cos(2*pi/(N-1)*(nn-1));
for ii = 1:M
    Windowed(ii,:) = Frames(ii,:).*wn;
end;
f = Windowed;

function f = FFT_After_Windowing(Windowed)
spectrum = fft(Windowed);
f = spectrum;
gevang
  • 4,994
  • 25
  • 33
Dang Manh Truong
  • 795
  • 2
  • 10
  • 35

2 Answers2

1

Transpose s or transpose the resulting f (it's just a matter of convention).

There is nothing wrong with the melfb function you are using, merely with the dimensions of the signal in the example you are trying to run (in the commented lines 14-17).

%  f = fft(s);
%  m = melfb(p, n, fs);
%  n2 = 1 + floor(n/2);
%  z = m * abs(f(1:n2)).^2;

The example assumes that you are using a "colum-vector signal s". From the size of your Fourier transformed f (done via fft which respects the input signal dimensions) your input signal s is a row-vector signal.

The part that gives you the error is the actual filtering operation that requires multiplying a p x n2 matrix with a n2 x 1 column-vector (i.e., each filter's response is multiplied pointwise with the Fourier of the input signal). Since your input s is 1 x n, your f will be 1 x n and the final matrix to vector multiplication for z will give an error.

gevang
  • 4,994
  • 25
  • 33
  • Thank you, I found my errors (shame on me for this silly one). But I still want to know how to do it using only the matrix , i.e. without iterating through each column. – Dang Manh Truong Jan 04 '16 at 16:58
0

Thanks to gevang's anwer, I was able to find out my mistake. Here is how I modified the code:

function result = MFFC()
[y Fs] = audioread('s2.wav');
% sound(y,Fs)

Frames = Frame_Blocking(y,128);
Windowed = Windowing(Frames);
%spectrum = FFT_After_Windowing(Windowed');
%imagesc(mag2db(abs(spectrum)))

p = 20;
%S = size(spectrum);
%n = S(2);

%f = spectrum;

S1 = size(Windowed);

n  = S1(2);
n2 = 1 + floor(n/2);
%z  = zeros(S1(1),n2);
z = zeros(20,S1(1));
for ii=1: S1(1)

    s = (FFT_After_Windowing(Windowed(ii,:)'));

    f = fft(s);
    m = melfb(p,n,Fs);
    % n2 = 1 + floor(n/2);
    z(:,ii) = m * abs(f(1:n2)).^2;
end;


%f = FFT_After_Windowing(Windowed');
%S = size(f);
%n = S(2);
%size(f)
%m = melfb(p, n, Fs);
%n2 = 1 + floor(n/2);
%size(m)
%size(abs(f(1:n2)).^2)
%z = m * abs(f(1:n2)).^2;

result = z;

As you can see, I naively assumed that the function deals with row-wise matrices, but in fact it deals with column vectors (and maybe column-wise matrices). So I iterate through each column of the input matrix and then combine the results. But I don't think this is efficient and vectorized code. Also I still can't figure out how to do column-wise operations on the input matrix (Windowed - after the windowing step), instead of using a loop.

Dang Manh Truong
  • 795
  • 2
  • 10
  • 35