-1

I have a 50 wav file of glass break sound and 50 wav file sound of normal sound. All the sound file duration is 1 second. Then I need to classify the sound using neural network. How can I extract the sound file and what neural network should i use?

Here is the code that my friend and i been working on :

%network input extraction (retrieve trimmed data audio)

p = which('audio_000.wav');
file_list = dir ([fileparts(p)  filesep 'audio_***.wav']);
% file 000-050 is glass break 
% file 051-100 is normal 
file_names = {file_list.name}';
n = length(file_names);
inp = zeros (n,6);

for k=1:n
    %read WAV file 
    aud1=audioread(file_names{k});
    a=16000;
    aud2=zeros(a,1);
    [m,o]=size(aud1);
    j=1:m;
    aud2(j)=aud1(j);

    %Fourrier Transforms
    %extract feature

    Fs=1000;
    nfft=500;
    X=fftshift(fft(aud2,nfft));
    X=X(1:nfft);
    mx=abs(X);
    f= -Fs/2:Fs/(nfft-1):Fs/2;

    %sorting to gets 5 peaks of FFT
    %retrieve 5 highest value of peaks 

    mx1=mx;
    f1=f;
    s=zeros(nfft,2);
    for i=1:nfft %sort the value of 5 peak amplitude and retrieve 5 highest
        if f1(i)<=1
            mx1(i)=0;
        end
        s(i,1)=mx(i);
        s(i,2)=f1(i);
    end
    s1=sortrows(s);
    s2=s1;
    for i=nfft:-1:2
        if s1(i,1)>s1(i-1,1) && s1(i,2)>s1(i-1,2)
            s2(i-1,1)=0;
        end
    end
    s3=sortrows(s2);
    s4=s3;

    for i=nfft:-1:2
    if s3(i,1)>s3(i-1,1) && s3(i,2)-s3(i-1,2)>-1
    s4(i-1,1)=0;
        end
    end
    s5=sortrows(s4);

    %get length of WAV files 
    l=m/10e4;

    % Input Vector for neural network
    % 5 input from FFT
    % i input from the length audio

    inp(k,1:end)=[s3(nfft:-1:nfft-4,2)' l];
end
figure, plot(aud1); 
figure, plot(f,mx);

% define target 
tar=zeros(2,1);

%tar(1:50) glass break
%tar(51:100) normal sound

tar(1:50,1)=0;
tar(51:100,1)=1;

trinput=inp';
trtarget=tar';

display('press any key to cont');

% neural network training

nnstart; %start neural network tool 

1 Answers1

0

Let X be the matrix containing your features which you have extracted using fft. T is your target vector where T(i) = 0, if your i-th sound file contains normal sound, and T(i) = 1, if your i-th sound file contains the sound of breaking glass.

You set the layer size of your neural network:

layerSize = 10;

and initialize your network

net = patternnet(layerSize);

then you can divide your data into training-, validation and testing sets

net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;

Now you can train your network

[net,tr] = train(net,X,T);

and perform the testing

outputs = net(X);
errors = gsubtract(T,outputs);
performance = perform(net,T,outputs);

and finally

view(net)

Note that you can find this code with a more detailed explanation at the Mathworks webpage.

StefanM
  • 797
  • 1
  • 10
  • 17