0

I am implementing this neural network for some classification problem. I initially tried back propagation but it takes longer to converge. So I though of using RPROP. In my test setup RPROP works fine for AND gate simulation but never converges for OR and XOR gate simulation.

  1. How and when should I update bias for RPROP?
  2. Here my weight update logic:

for(int l_index = 1; l_index < _total_layers; l_index++){ Layer* curr_layer = get_layer_at(l_index);

    //iterate through each neuron
    for (unsigned int n_index = 0; n_index < curr_layer->get_number_of_neurons(); n_index++) {
        Neuron* jth_neuron = curr_layer->get_neuron_at(n_index);

        double change = jth_neuron->get_change();

        double curr_gradient = jth_neuron->get_gradient();
        double last_gradient = jth_neuron->get_last_gradient();

        int grad_sign = sign(curr_gradient * last_gradient);

        //iterate through each weight of the neuron
        for(int w_index = 0; w_index < jth_neuron->get_number_of_weights(); w_index++){
            double current_weight = jth_neuron->give_weight_at(w_index);
            double last_update_value = jth_neuron->give_update_value_at(w_index);

            double new_update_value = last_update_value;
            if(grad_sign > 0){
                new_update_value = min(last_update_value*1.2, 50.0);
                change = sign(curr_gradient) * new_update_value;
            }else if(grad_sign < 0){
                new_update_value = max(last_update_value*0.5, 1e-6);
                change = -change;
                curr_gradient = 0.0;
            }else if(grad_sign == 0){
                change = sign(curr_gradient) * new_update_value;
            }

            //Update neuron values
            jth_neuron->set_change(change);
            jth_neuron->update_weight_at((current_weight + change), w_index);
            jth_neuron->set_last_gradient(curr_gradient);
            jth_neuron->update_update_value_at(new_update_value, w_index);

            double current_bias = jth_neuron->get_bias();
            jth_neuron->set_bias(current_bias + _learning_rate * jth_neuron->get_delta());
        }
    }
}
puru020
  • 808
  • 7
  • 20

1 Answers1

0

In principal you don't treat the bias differently than before when you did backpropagation. It's learning_rate * delta which you seem to be doing.

One source of error may be that the sign of the weight change depends on how you calculate your error. There's different conventions and (t_i-y_i) instead of (y_i - t_i) should result in returning (new_update_value * sgn(grad)) instead of -(new_update_value * sign(grad)) so try switching the sign. I'm also unsure about how you specifically implemented everything since a lot is not shown here. But here's a snippet of mine in a Java implementation that might be of help:

// gradient didn't change sign: 
if(weight.previousErrorGradient * errorGradient > 0) 
    weight.lastUpdateValue = Math.min(weight.lastUpdateValue * step_pos, update_max);
// changed sign:
else if(weight.previousErrorGradient * errorGradient < 0) 
{
    weight.lastUpdateValue = Math.max(weight.lastUpdateValue * step_neg, update_min);
}
else
    weight.lastUpdateValue = weight.lastUpdateValue; // no change           

// Depending on language, you should check for NaN here.

// multiply this with -1 depending on your error signal's sign:
return ( weight.lastUpdateValue * Math.signum(errorGradient) ); 

Also, keep in mind that 50.0, 1e-6 and especially 0.5, 1.2 are empirically gathered values so they might need to be adjusted. You should definitely print out the gradients and weight changes to see if there's something weird going on (e.g. exploding gradients->NaN although you're only testing AND/XOR). Your last_gradient value should also be initialized to 0 at the first timestep.

runDOSrun
  • 10,359
  • 7
  • 47
  • 57
  • Thanks for replying. I am using (t_i-y_i) convention. My last_gredient is initialized to 0 in the class constructor. I will though go through the code again and see if I find things that you have suggested. – puru020 Sep 27 '15 at 20:52
  • So I fixed the issue and now works (at least partially). In 6-7 iterations out of 10 the error either goes down initially and the stops going down or loops in down..up..down..up. What could be the reason. I don't think this is normal? – puru020 Sep 28 '15 at 23:06
  • It's overshooting local minima because the gradient descent steps are too large. Try adjusting the parameters. – runDOSrun Sep 29 '15 at 10:48