Introduction
I'm very new to Artificial Intelligence, Machine Learning, and Neural Network.
I tried to code some stuff with the help of the FANN (Fast Artificial Neural Network) library (C++) for testing the capability of this kind of system.
Programming
I made a little piece of code that generate a learning file to process a supervised learning. I've already done some test but this one what made to understand the relation between hidden layers' organization, and AI capability, to solve the same problem.
To explain my observation, I will use notation A-B-C-[...]-X to picture a configuration of A input neurons, B neurons on the first hidden layer, C neurons on the second, ..., and X output neurons.
In those test, the learning data was 2k random result of a working NOT function (f(0)=1 ; f(1)=0) (equivalent of '!' in many langages). Note too that an Epoch represent 1 training test over all the learning data. "AI" will represent a trained ANN.
No error have been made in the learning data.
You can find the entire source code on my GitHub Repo.
More is not better
First, I noticed that 1-1-1 system is more powerful in 37 Epochs than 1-[50 layers of 5 neurons]-1 is in 20k Epochs (0.0001 error rate against 0.25).
My first though was that the second AI needed more training, because there are a lot more costs to minimize, but I ain't sure this is the only reason.
This lead me to try some tests with the same total number of neurons.
Equal is not equal
1-2-2-1 configuration seems more efficient than 1-4-1
Actually, when I run a test over those two different configurations, I got those outputs (testing program coded on my own). Those are two different tests, "9**" is the current index of the test.
The test consist in giving random int between 0 and 1 to the AI and printing the output. Each test has been run separately.
// 1-2-2-1
[936]Number : 0.000000, output : 1.000000
[937]Number : 1.000000, output : 0.009162
[938]Number : 0.000000, output : 1.000000
[939]Number : 0.000000, output : 1.000000
[940]Number : 1.000000, output : 0.009162
[941]Number : 0.000000, output : 1.000000
[942]Number : 0.000000, output : 1.000000
// 1-4-1
[936]Number : 0.000000, output : 1.000000
[937]Number : 0.000000, output : 1.000000
[938]Number : 1.000000, output : 0.024513
[939]Number : 0.000000, output : 1.000000
[940]Number : 0.000000, output : 1.000000
[941]Number : 1.000000, output : 0.024513
[942]Number : 1.000000, output : 0.024513
You can notice that the first config gives a result nearer to 0 than the second one. (0.009162 against 0.024513). That's not a IEEE encoding issue, and those 2 values don't change if I run another test.
What's the reason of that ? Let's try to figure it out.
- How many "synapse" do we have on the first config ?
first
first[0]->second[0]
first[0]->second[1]
then
second[0]->third[0]
second[0]->third[1]
second[1]->third[0]
second[1]->third[1]
final
third[0]->first[0]
third[1]->first[0]
So we get a total amount of 2 + 4 + 2 = 8 synapses. (and so 8 different weights possibilities).
- What about the second configuration ?
first
first[0]->second[0]
first[0]->second[1]
first[0]->second[2]
first[0]->second[3]
final
second[0]->third[0]
second[1]->third[0]
second[2]->third[0]
second[3]->third[0]
So we get a total of 4 + 4 = 8 synapses. (still 8 different weights possibilities).
And on both systems we have 4 activation functions (1 for each neuron).