I wrote a neural network, its mostly based (bug fixed) on the neural nets from James McCaffrey https://visualstudiomagazine.com/articles/2015/04/01/back-propagation-using-c.aspx i came across various Git projects and books, using his code And as he worked for MS research i assumed his work would be good, maybe not top of the bill (its not running on top of cuda or so) but its code that i can read, although i'm not into the science side of it. His sample worked on a dataset much alike my problem.
I had the goal to solve some image classification (pixel info based data set) This problem wasn't easy to recreate but I managed to create a data set of 50 good scenarios and a 50 bad scenarios. When plotted out the measurements in a scatter diagram both sets had a lot fuzzy boundary overlappings. I myself was unable to make something out of it, it was to fuzzy for me.As I had 5 inputs per sample, I wondered if a neural net might be able to find the inner relations and solve my fuzzy data classification problem.
And well so it did .. well i kinda guess.
As depending on the seeding of weights (i got to 80%), the amount of nodes and the time of learning; I get training scores of around 90% to 85% and lately 95%
First I played with the random initialization of the weights. Then I played with the amount of nodes. The I played with Learn Rate,Momentum, and weight decay. they went from (scoring 85 to 90%):
// as in the example code i used
int maxEpochs = 100000;
double learnRate = 0.05;
double momentum = 0.01;
double weightDecay = 0.0001;
to (scores 95%)
int maxEpochs = 100000;
double learnRate = 0.02; //had a huge effect
double momentum = 0.01;
double weightDecay = 0.001; //had a huge effect
I'm a bit surprised that the number of nodes had less effect as compared changing random initialization of the net, and changing the above constants.
However it makes me wonder.
- As a general thumb-rule is 95% a high score ? (not sure where the limits are but i think it also depends on the data set, while I am amazed by 95% I wonder if it would be possible to tweak it to 97%.
- The number of hidden nodes, should i try to minimize them ? currently its a 5:9:3 but I had a similar score once with a 5:6:3 network.
- Is it normal for a neural network to have great scoring influence by changing initial random seed weights (different start seed) to get to a model; as i thought the training would overcome the start situation.