I'm a beginner and I'm trying to implement Backpropagation in C# for school purposes (so no tensorflow for now, we have to learn it manually). I have 64 nodes for input layer and 64 nodes for the output layer, somewhat an Autoencoder structure because we will be discussing MLP later on.
I'm calculating Delta Output as:
delta_out = (y_out) * (1 - y_out) * (desired - y_out)
I have tested my program to an XOR input/output scenario and it will correctly guess for this scenario but if I will put all the 64 nodes of input and output, then it will not give me the correct prediction (like 0% accuracy).
I'm also trying to total all of the delta_out abs(delta_out). For the XOR scenario, the absolute sum of delta_out is approaching zero as training progresses. But if I choose the 64 input and output test, then the absolute sum of all delta_out starts from a very small number and stays there.
For the XOR that's properly working (I have also tried OR and AND tests which just works fine), I used the following structure 2 nodes for input, 4 nodes for hidden, and 1 node for output.
For the 64 input and output, I have tested various numbers of nodes for the hidden layer, starting from 8 nodes to 128 nodes. If I use 64 or more nodes for the hidden layer, then the absolute sum of all the delta_out is near 0 even at the start and changes too slow.
I have also tested various learning rates (different learning rate for hidden and output layer). I tested from 0.1 to 0.75 but it seems it doesn't help for the 64 input/output that I'm supposed to accomplish. I have also changed the number of epochs from 100k to 500k but nothing seems to help.
Maybe I don't understand the Backpropagation concept well?