4

My goal is to create program on octave that loads audio file (wav, flac), calculates its mfcc features and serve them as output. The problem is that I do not have much experience with octave and cannot get octave load the audio file and that is why I am not sure if the extraction algorithms is correct. Is there simple way of loading the file and getting its features?

nstanchev
  • 63
  • 7
  • what exactly have you tried and what is not working? Note that Octave 4.0.0 is the latest release and one of its main features is support for audio. – carandraug May 31 '15 at 17:24

2 Answers2

3

You can run mfcc code from RASTAMAT in octave, you only need to fix few things, the fixed version is available for download here.

The changes are to properly set windows in powspec.m

  WINDOW = hanning(winpts);

and to fix the bug in specgram function which is not compatible with Matlab.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • Thank you. But I get errors when trying to load *.m files in octave. Any suggestions? – nstanchev May 31 '15 at 21:48
  • 1
    It is hard to suggest you anything because you didn't provide any information about errors. – Nikolay Shmyrev May 31 '15 at 22:25
  • I have directory with all m-files from the site and mp3 file a.mp3. When I try to run the example command from the site `[d,sr] = mp3read('a.mp3',[1 30*22050],1,2);` I get `error: 'mp3read' undefined near line 9 column 11` – nstanchev Jun 01 '15 at 07:41
  • Thank you very much. Could I ask you for further help if I get more errors? – nstanchev Jun 01 '15 at 07:54
  • After running this `[mm,aspc] = melfcc(d*3.3752, sr, 'maxfreq', 8000, 'numcep', 20, 'nbands', 22, 'fbtype', 'fcmel', 'dcttype', 1, 'usecmp', 1, 'wintime', 0.032, 'hoptime', 0.016, 'preemph', 0, 'dither', 1);` I get this `error: specgram: A(I,J,...) = X: dimensions mismatch` Any suggestions? – nstanchev Jun 01 '15 at 08:48
  • Just to mention I use not very short mp3 file. – nstanchev Jun 01 '15 at 09:17
  • You can check dimensions of the input data to understand why there is a dimension mismatch. – Nikolay Shmyrev Jun 01 '15 at 15:10
  • `octave:10> ndims(d) ans = 2 octave:11> rows(d) ans = 661500 octave:12> columns(d) ans = 1 octave:13> rows(sr) ans = 1 octave:14> columns(sr) ans = 1` Is this OK? – nstanchev Jun 01 '15 at 17:05
  • I really really appreciate this. Thank you. I do not want to be rude but have few more questions. Is the **mm** matrix that contains mfcc features? And other thing is that when executing `[im,ispc] = invmelfcc(mm, sr, 'maxfreq', 8000, 'numcep', 20, 'nbands', 22, 'fbtype', 'fcmel', 'dcttype', 1, 'usecmp', 1, 'wintime', 0.032, 'hoptime', 0.016, 'preemph', 0, 'dither', 1);` This happens `error: invpowspec: product: nonconformant arguments (op1 is 512x1497, op2 is 513x1498)` – nstanchev Jun 01 '15 at 19:53
  • Yes, mm is MFCC. There are few other bugs in octave. I edited the answer again with the link to the fixed version, it should work as expected. – Nikolay Shmyrev Jun 01 '15 at 22:57
  • This is great. Is there a way of taking less frames because 20X1872 matrix is pretty big and is it going to lower the quality of speech recognition? What does aspc matrix represent? – nstanchev Jun 03 '15 at 12:11
  • 3
    @NikolayShmyrev I know this is an old one but the link provided to the source code is dead. Could you provide an alternative link or post the code somewhere else? – jotadepicas Jun 19 '16 at 23:52
  • 1
    Found this other code by Dr. Sunil Kopparapu: https://sites.google.com/site/sunilkopparapu/Home/asks (cited in "Computing MFCC in Octave" video: https://www.youtube.com/watch?v=oTI6c87M3Gs) – jotadepicas Jun 20 '16 at 00:13
  • 2
    Dropbox link is dead. 404! – Indra Mar 07 '17 at 08:14
2

Check out Octave functions for calculating MFCC at https://github.com/jagdish7908/mfcc-octave

For a detailed theory on steps to compute MFCC, refer http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/

 function frame = create_frames(y, Fs, Fsize, Fstep)
  N = length(y);
  % divide the signal into frames with overlap = framestep
  samplesPerFrame = floor(Fs*Fsize);
  samplesPerFramestep = floor(Fs*Fstep);
  i = 1;
  frame = [];
  while(i <= N-samplesPerFrame)
    frame = [frame y(i:(i+samplesPerFrame-1))];
    i = i+samplesPerFramestep;
  endwhile
  return 
 endfunction

function ans = hz2mel(f)
  ans = 1125*log(1+f/700);
  return
 endfunction

 function ans = mel2hz(f)
  ans = 700*(exp(f/1125) - 1);
  return
 endfunction

function bank = melbank(n, min, max, sr)
  % n = number of banks
  % min = min frequency in hertz
  % max = max frequency in hertz 
  % convert the min and max freq in mel scale
  NFFT = 512;
  % figure out bin value of min and max freq
  minBin = floor((NFFT)*min/(sr/2));
  maxBin = floor((NFFT)*max/(sr/2));
  % convert the min, max in mel scale
  min_mel = hz2mel(min);
  max_mel = hz2mel(max);
  m = [min_mel:(max_mel-min_mel)/(n+2-1):max_mel];
  %disp(m);
  h = mel2hz(m);
  % replace frequencies in h with thier respective bin values
  fbin = floor((NFFT)*h/(sr/2));

  %disp(h);
  % create triangular melfilter vectors
  H = zeros(NFFT,n);
  for vect = 2:n+1
    for k = minBin:maxBin
      
      if k >= fbin(vect-1) && k <= fbin(vect)
        H(k,vect) = (k-fbin(vect-1))/(fbin(vect)-fbin(vect-1));  
      elseif k >= fbin(vect) && k <= fbin(vect+1)
        H(k,vect) = (fbin(vect+1) - k)/(fbin(vect+1)-fbin(vect));
      endif
      
    endfor
  endfor
  bank = H;
  return
 endfunction     

clc;
clear all;
close all;
pkg load signal;

% record audio
Fs = 44100;
y = record(3,44100);
% OR %
% Load existing file
%[y, Fs] = wavread('../FILE_PATH/');
%y = y(44100:2*44100);
 
 % create mel filterbanks
 minFreq = 500;   % minimum cutoff frequency in Hz
 maxFreq = 10000;   % maximum cutoff frequency in Hz
% melbank(number_of_banks, minFreq, mazFreq, sampling_rate)
 foo = melbank(30,minFreq,maxFreq,Fs);

 % create frames
 frames = create_frames(y, Fs, 0.025, 0.010);
 % calculate periodogram of each frame
 NF = length(frames(1,:));
 [P,F] = periodogram(frames(:,1),[], 1024, Fs);
 % apply mel filters to the power spectra
 P = foo.*P(1:512);
 % sum the energy in each filter and take the logarithm
 P = log(sum(P));
 % take the DCT of the log filterbank energies
 % discard the first coeff 'cause it'll be -Inf after taking log
 L = length(P);
 P = dct(P(2:L));
 PXX = P;

 for i = 2:NF
  P = periodogram(frames(:,i),[], 1024, Fs);
   % apply mel filters to the power spectra
  P = foo.*P(1:512);
  % sum the energy in each filter and take the logarithm
  P = log(sum(P));
  % take the DCT of the log filterbank energies
  % discard the first coeff 'cause it'll be -Inf after taking log
  P = dct(P(2:L));
  % coeffients are stacked row wise for each frame
  PXX = [PXX; P];
 endfor
 % stack the coeffients column wise
 PXX = PXX';
 plot(PXX);
  • 1
    Welcome to SO! Don't post links to websites, as it might be broken or taken down in future. Instead, explain the solution. – Abhishek Dutt Jul 12 '21 at 05:28