3

I write the program removing vocals from song using fft. Before C# I decided to test the algorithm of reduce frequency in Matlab, but can't get result as in example. There's a noise. I've tried select any range (0.7 - 1.5), but all the same...noise. What I do not? Please, help me to write it right) Thanks in advance!

[y, fs] = wavread('Song.wav');
left = y(:,1);
right = y(:,2);
fftL = fft(left);
fftR = fft(right);

for i = 1:683550 %in my example 683550
  dif = fftL(i,1) / fftR(i,1);
  dif = abs(dif);
  if (dif > 0.7 & dif < 1.5)
    fftL(i,1) = 0;
    fftR(i,1) = 0;
  end;
end;

leftOut = ifft(fftL);
rightOut = ifft(fftR);
yOut(:,1) = leftOut;
yOut(:,2) = rightOut;

wavwrite(yOut, fs, 'tmp.wav');
marko
  • 9,029
  • 4
  • 30
  • 46
SergeyLazarev
  • 177
  • 2
  • 13
  • 3
    I think this is more of a [dsp.stackexchange.com](http://dsp.stackexchange.com/) kind of problem... – Eitan T Jan 22 '13 at 14:49
  • The user has now moved it himself: http://dsp.stackexchange.com/questions/7597/matlab-removing-vocals – Dennis Jaheruddin Jan 22 '13 at 15:00
  • ok, OP just got asked to flag for migration to stackoverflow at dsp.stackexchange. I'll answer here... – KlausCPH Jan 22 '13 at 15:36
  • [This](http://stackoverflow.com/questions/14393677/removing-vocals-from-sound-file-in-matlab/14395472#14395472) is a very similar question, trying a similar approach. – marko Jan 23 '13 at 01:42

2 Answers2

1

From the code I can see that you simply classify frequency content as being a vocal if it is "equal" in strength between left and right (equal being defined as a ratio in between 0.7 and 1.5). I'm not familiar with your reasons for this scheme, but it may actually yield a decent result.

What you are doing wrong does most likely have to do with fft size and the fact that you are treating the complete signal in one go, so to say.

Vocals in e.g. a song vary over time, therefore your masking has to vary as well. What this means is that you have to break up your signal in frames in the time-domain and do your fft and masking separately for each frame. Also you should consider to use an overlap in your time-domain framing.

Regards

KlausCPH
  • 1,816
  • 11
  • 14
  • 1
    In addition to all of the points above, [Windowing](http://en.wikipedia.org/wiki/Window_function) is also required to minimise bleed between adjacent bins of strong frequencies. There is a distinct trade-off between time and frequency resolution to be made to avoid nasty smearing of transients. The OP's code above is the extreme of this! The rationale is that vocals are often panned to centre, whereas instruments are more likely to be panned one side of the other. It's not a particularly robust assumption to make, and there is plenty of programme material that is not like this. – marko Jan 23 '13 at 01:40
  • 1
    Very cool. Would love to hear an example :-) If you have the time, feel free to upload an pre- and post processed example to Youtube or similar and post a link here. – KlausCPH Jan 23 '13 at 08:30
0

maybe this help someone:

[file, path] = uigetfile('*.wav','Select a .wav file');
if file == 0
    return
end

[y,Fs]= audioread(file);

if size(y,2) == 1
    msgbox('The selected file is Mono. This algorithm is applicable only for Stereo files.');
    return;
end

% fc=input('Enter Cutoff Frequency (HPF):');
% fc=round(fc);

fc = 3000;
if fc > 20
    fp = fc+5;
    fs = fc/(Fs/2);
    fp = fp/(Fs/2);
    [n wn] = buttord(fp,fs,0.5,80);
    [b, a] = butter(5,wn,'High');
    channel_2 = filtfilt(b,a,y(:,2));
else
    channel_2 = y(:,2);
end

background = y(:,1) - channel_2;

%Write it to a file
audiowrite([cd '\background.wav'],background,Fs);