I've just created my first neural net, which uses gradient method and back propagation learning algorithm. It uses hyperbolic tangent as activation function. The code is well unit tested so I was full of good hopes that the net will actually work. Then I decided to create an integration test and try to teach my net to solve some very simple functions. Basically I'm testing if the weight improves (there's only one, as this is a very small net - input plus one neuron).
// Combinations of negative sign for values greater than 1
[TestCase(8, 4)] // FAIL reason 1
[TestCase(-8, 4)] // FAIL reason 1
[TestCase(8, -4)] // FAIL reason 1
[TestCase(-8, -4)] // FAIL reason 1
// Combinations of negative sign for values lesser than 1
[TestCase(.8, .4)] // OK
[TestCase(-.8, .4)] // FAIL reason 2
[TestCase(.8, -.4)] // FAIL reason 2
[TestCase(-.8, -.4)] // OK
// Combinations of negative sign for one value greater than 1 and the other value lesser than 1
[TestCase(-.8, 4)] // FAIL reason 2
[TestCase(8, -.4)] // FAIL reason 2
// Combinations of one value greater than 1 and the other value lesser than 1
[TestCase(.8, 4)] // OK
[TestCase(8, .4)] // FAIL reason 1
public void ShouldImproveLearnDataSetWithNegativeExpectedValues(double expectedOutput, double x)
{
var sut = _netBuilder.Build(1, 1); // one input, only one layer with one output
sut.NetSpeedCoefficient = .9;
for (int i = 0; i < 400; i++)
{
sut.Feed(new[] { x }, new[] { expectedOutput });
}
var postFeedOutput = sut.Ask(new[] { x }).First();
var postFeedDifference = Math.Abs(postFeedOutput - expectedOutput);
postFeedOutput.Should().NotBe(double.NaN);
postFeedDifference.Should().BeLessThan(1e-5);
}
I was very disappointed, because most of the test cases failed (only 3 marked with '// OK' passed). I dug into the code and found out some interesting facts.
- The hyperbolic tangent max value is 1. So no matter how big the sum of weight * input is, the neuron's output absolute value will always be <= 1. In other words the net will never learn to solve a function if it's return absolute value is greater than 1. That explains all failures of test cases with 8, -8 expected outputs.
- In test cases where one of the numbers is negative the final weight should also be negative. Firstly it decreases, but it would never become negative. It either stops around 0 or jumps back and forth around 0.
Is neural net only capable of solving problems with 0..1 input values and 0..1 expected output values or is there something wrong with my implementation?