0

Before I begin, I'd just like to preface this by saying that I only started coding in October so excuse me if it's a little be clumsy.

I've been trying to make a MLP for a project I've been doing. I have the hidden layer(Sigmoid) and output layer(softmax) and they seem to be working properly. However, when I computed the back propagation, my error initially decreases and then alternates between two different values.

See image below, graph of epoch(x) against error(y).

I have tried multiple different learning rates, different numbers of epochs, different random initial weights. Everything I can think of, but I keep getting the same problem. I have also normalised the data and the targets to values between 0-1.

After lowering the learning rate considerably, I get a smoother graph but still there isn't a lot of error reduction after the first few epochs. For example, when using 10/50/100 epochs of training there is very little reduction after 4-5 epochs. Have I hit a local minimum, can I improve this

Can anyone shed some light on why this is happening and suggest so code that can resolve the problem? I would really appreciate it

I have enclosed the code I used for the Back Propagation algorithm.

function [deltaW1, deltaW2,error] = BackProp(input,lr,MLPout,weightsL2,targ,outunits,outofhid)
%This function returns a new set of updated weights for layer 1 and layer 2
%  These weights replace the previous set to improve generalisation

%% Finding the derivative of the Sigmoid
DerivativeS = outofhid.*(ones(size(outofhid)) - outofhid);

%% Finding delta2
error = zeros(10,length(MLPout));
for y = 1:length(outunits)
    for j = 1:length(MLPout)
        error(y,j) = (targ(j) - MLPout(y,j));
    end
end
%finding delta x weights two - the new weights between the hidden layer and output layer

deltaW2 = lr.*(error*outofhid');

%finding delta one - the new weights between the input layer and hidden layer


 deltaW1 = lr*(((error'*weightsL2').*DerivativeS')'*input);
 deltaW1 = deltaW1';
end
ServerS
  • 452
  • 3
  • 15
  • Have you normalized your data? Have you tried an already tested solver with the same data and checked if it works (be sure to know how the solver works) on the data and it is not a problem with your code? What is the back-propagation equation you are trying to implement? I am lazy to search for it. – Werner Dec 24 '14 at 00:32
  • Hi. yes I've normalised the data. I don't actually know what you mean by a solver. (sorry, very green here) – Diarmaid Finnerty Dec 24 '14 at 05:07
  • @Werner Hi. yes I've normalised the data and also normalised the outputs. I don't actually know what you mean by a solver. (sorry, very green here) The algorithm I'm trying to implement is a delta weight update equation. You can find it on page L7-13 here http://www.cs.bham.ac.uk/~jxb/INC/l7.pdf – Diarmaid Finnerty Dec 24 '14 at 05:14
  • By a solver, I mean an implemented framework used to solve some kind of problem, in this case neural networks. This is offered by matlab itself: try using matlab toolbox if you have access to it and checking if this is an issue if your data, or if your implementation. If this data is provided by your teacher, you may skip this step if this data is already known and there are other works using neural networks with it, otherwise I strongly recommend you use a solver first to know how a neural network SHOULD work on the data you have. – Werner Dec 25 '14 at 14:41
  • You problem may be those expressions `lr.*(error*outofhid');` and `deltaW1 = lr*(((error'*weightsL2').*DerivativeS')'*input);`. You are trying to use matrix implementation which is better at matlab because it is faster. Implement a `for loop` and check if the results are the same. It really makes me think that the error is in that line, once you are using `transpose` and `.*` operators and you do not seem to have yet be used on how to use them. You really seem to be using a wrong dimension of the matrix to multiply and then lead to wrong results. – Werner Dec 25 '14 at 14:51

1 Answers1

1

From your code it sounds to me that your error can get negative values. Try Euclidean distance as your cost function. something similar to:

error(y,j) = sqrt ( (targ(j) - MLPout(y,j))^2 )

There are other cost functions that usually work better than Euclidean distance (because they are "more convex" and less likely to get stuck in local minima). For example Negative log likelihood is one of good choices. Some explanations and Python codes are provided here. for MATLAB implementation, you may find this page helpful.

Amin Suzani
  • 567
  • 6
  • 19