2

I'm currently learning neural networks and have attempted to train an MLP to learn XOR using Back-propagation in Python. The network has two hidden layers (using Sigmoid Activation) and one output layer (also Sigmoid). The network (around 20,000 epochs, and with a learning rate of 0.1) outputs numbers close to the original class label:

prediction: 0.11428432952745145 original class output was: 0

prediction: 0.8230114358069576 original class output was: 1

prediction: 0.8229532575410421 original class output was: 1

prediction: 0.23349671680470516 original class output was: 0

When i plot the errors (for every epoch), my graph shows a steep decline, then a slight 'bump', i was under the impression that the errors would gradually reduce:

Errors (summed) vs Epoch

Would this be classed as converging? I've tried to adjust the learning rate, with no luck.

Thanks!

2 Answers2

1

Not necessarily, the NN will solbe an optimization problem changing the weights. This is not guaranteed to only fall maybe some of the choice of the gradient descent was picking "worse" values. I would recommend to experiment for more epochs and eventually it will converge.If you want post your code for more specific tips.

partizanos
  • 1,064
  • 12
  • 22
  • Thank you! I will update the epoch amount and re-run the experiment. I've only tried using sigmoid activation functions, so will experiment with others as well. – Laurent Kelly Apr 01 '21 at 17:28
1

Yes -- definitely converging! You're getting the characteristic XOR learning curve for MLP with sigmoid activations -- you could put that in a textbook. And there's nothing faster than expected with that number of epochs. In fact, you could probably set the learning rate higher and maybe the step-size as well.

Assessing convergence statistically (not as a closed-form limit, nor graphically) can be a bit difficult. But that graph is pretty good evidence of convergence.

gerowam
  • 383
  • 1
  • 11
  • Thanks! I will try a few different learning rates. In terms of plotting the error, would you recommend simply 'summing' the error every epoch, or taking an average every epoch? I realised I completely forgot to add a bias (wasnt too sure if i needed one), so programming that in now also. – Laurent Kelly Apr 01 '21 at 17:25
  • 1
    A bias isn't strictly necessary for XOR, but it could help if your layers are narrow. As for summing vs. averaging -- they're identical if unless your testing datasets have different numbers of points. That is, it's either sum(E) or sum(E) / N. If N is constant, there's just no difference. In most advanced ANN research, you'll see the average error reported. – gerowam Apr 02 '21 at 18:13