My goal is to create program on octave that loads audio file (wav, flac), calculates its mfcc features and serve them as output. The problem is that I do not have much experience with octave and cannot get octave load the audio file and that is why I am not sure if the extraction algorithms is correct. Is there simple way of loading the file and getting its features?
Asked
Active
Viewed 1,693 times
4
-
what exactly have you tried and what is not working? Note that Octave 4.0.0 is the latest release and one of its main features is support for audio. – carandraug May 31 '15 at 17:24
2 Answers
3
You can run mfcc code from RASTAMAT in octave, you only need to fix few things, the fixed version is available for download here.
The changes are to properly set windows in powspec.m
WINDOW = hanning(winpts);
and to fix the bug in specgram function which is not compatible with Matlab.

Nikolay Shmyrev
- 24,897
- 5
- 43
- 87
-
Thank you. But I get errors when trying to load *.m files in octave. Any suggestions? – nstanchev May 31 '15 at 21:48
-
1It is hard to suggest you anything because you didn't provide any information about errors. – Nikolay Shmyrev May 31 '15 at 22:25
-
I have directory with all m-files from the site and mp3 file a.mp3. When I try to run the example command from the site `[d,sr] = mp3read('a.mp3',[1 30*22050],1,2);` I get `error: 'mp3read' undefined near line 9 column 11` – nstanchev Jun 01 '15 at 07:41
-
Thank you very much. Could I ask you for further help if I get more errors? – nstanchev Jun 01 '15 at 07:54
-
After running this `[mm,aspc] = melfcc(d*3.3752, sr, 'maxfreq', 8000, 'numcep', 20, 'nbands', 22, 'fbtype', 'fcmel', 'dcttype', 1, 'usecmp', 1, 'wintime', 0.032, 'hoptime', 0.016, 'preemph', 0, 'dither', 1);` I get this `error: specgram: A(I,J,...) = X: dimensions mismatch` Any suggestions? – nstanchev Jun 01 '15 at 08:48
-
-
You can check dimensions of the input data to understand why there is a dimension mismatch. – Nikolay Shmyrev Jun 01 '15 at 15:10
-
`octave:10> ndims(d) ans = 2 octave:11> rows(d) ans = 661500 octave:12> columns(d) ans = 1 octave:13> rows(sr) ans = 1 octave:14> columns(sr) ans = 1` Is this OK? – nstanchev Jun 01 '15 at 17:05
-
I really really appreciate this. Thank you. I do not want to be rude but have few more questions. Is the **mm** matrix that contains mfcc features? And other thing is that when executing `[im,ispc] = invmelfcc(mm, sr, 'maxfreq', 8000, 'numcep', 20, 'nbands', 22, 'fbtype', 'fcmel', 'dcttype', 1, 'usecmp', 1, 'wintime', 0.032, 'hoptime', 0.016, 'preemph', 0, 'dither', 1);` This happens `error: invpowspec: product: nonconformant arguments (op1 is 512x1497, op2 is 513x1498)` – nstanchev Jun 01 '15 at 19:53
-
Yes, mm is MFCC. There are few other bugs in octave. I edited the answer again with the link to the fixed version, it should work as expected. – Nikolay Shmyrev Jun 01 '15 at 22:57
-
This is great. Is there a way of taking less frames because 20X1872 matrix is pretty big and is it going to lower the quality of speech recognition? What does aspc matrix represent? – nstanchev Jun 03 '15 at 12:11
-
3@NikolayShmyrev I know this is an old one but the link provided to the source code is dead. Could you provide an alternative link or post the code somewhere else? – jotadepicas Jun 19 '16 at 23:52
-
1Found this other code by Dr. Sunil Kopparapu: https://sites.google.com/site/sunilkopparapu/Home/asks (cited in "Computing MFCC in Octave" video: https://www.youtube.com/watch?v=oTI6c87M3Gs) – jotadepicas Jun 20 '16 at 00:13
-
2
2
Check out Octave functions for calculating MFCC at https://github.com/jagdish7908/mfcc-octave
For a detailed theory on steps to compute MFCC, refer http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/
function frame = create_frames(y, Fs, Fsize, Fstep)
N = length(y);
% divide the signal into frames with overlap = framestep
samplesPerFrame = floor(Fs*Fsize);
samplesPerFramestep = floor(Fs*Fstep);
i = 1;
frame = [];
while(i <= N-samplesPerFrame)
frame = [frame y(i:(i+samplesPerFrame-1))];
i = i+samplesPerFramestep;
endwhile
return
endfunction
function ans = hz2mel(f)
ans = 1125*log(1+f/700);
return
endfunction
function ans = mel2hz(f)
ans = 700*(exp(f/1125) - 1);
return
endfunction
function bank = melbank(n, min, max, sr)
% n = number of banks
% min = min frequency in hertz
% max = max frequency in hertz
% convert the min and max freq in mel scale
NFFT = 512;
% figure out bin value of min and max freq
minBin = floor((NFFT)*min/(sr/2));
maxBin = floor((NFFT)*max/(sr/2));
% convert the min, max in mel scale
min_mel = hz2mel(min);
max_mel = hz2mel(max);
m = [min_mel:(max_mel-min_mel)/(n+2-1):max_mel];
%disp(m);
h = mel2hz(m);
% replace frequencies in h with thier respective bin values
fbin = floor((NFFT)*h/(sr/2));
%disp(h);
% create triangular melfilter vectors
H = zeros(NFFT,n);
for vect = 2:n+1
for k = minBin:maxBin
if k >= fbin(vect-1) && k <= fbin(vect)
H(k,vect) = (k-fbin(vect-1))/(fbin(vect)-fbin(vect-1));
elseif k >= fbin(vect) && k <= fbin(vect+1)
H(k,vect) = (fbin(vect+1) - k)/(fbin(vect+1)-fbin(vect));
endif
endfor
endfor
bank = H;
return
endfunction
clc;
clear all;
close all;
pkg load signal;
% record audio
Fs = 44100;
y = record(3,44100);
% OR %
% Load existing file
%[y, Fs] = wavread('../FILE_PATH/');
%y = y(44100:2*44100);
% create mel filterbanks
minFreq = 500; % minimum cutoff frequency in Hz
maxFreq = 10000; % maximum cutoff frequency in Hz
% melbank(number_of_banks, minFreq, mazFreq, sampling_rate)
foo = melbank(30,minFreq,maxFreq,Fs);
% create frames
frames = create_frames(y, Fs, 0.025, 0.010);
% calculate periodogram of each frame
NF = length(frames(1,:));
[P,F] = periodogram(frames(:,1),[], 1024, Fs);
% apply mel filters to the power spectra
P = foo.*P(1:512);
% sum the energy in each filter and take the logarithm
P = log(sum(P));
% take the DCT of the log filterbank energies
% discard the first coeff 'cause it'll be -Inf after taking log
L = length(P);
P = dct(P(2:L));
PXX = P;
for i = 2:NF
P = periodogram(frames(:,i),[], 1024, Fs);
% apply mel filters to the power spectra
P = foo.*P(1:512);
% sum the energy in each filter and take the logarithm
P = log(sum(P));
% take the DCT of the log filterbank energies
% discard the first coeff 'cause it'll be -Inf after taking log
P = dct(P(2:L));
% coeffients are stacked row wise for each frame
PXX = [PXX; P];
endfor
% stack the coeffients column wise
PXX = PXX';
plot(PXX);

Jagdish Chaudhary
- 21
- 3
-
1Welcome to SO! Don't post links to websites, as it might be broken or taken down in future. Instead, explain the solution. – Abhishek Dutt Jul 12 '21 at 05:28