0

I use a neural network with 3 layers for categorization problem: 1) ~2k neurons 2) ~2k neurons 3) 20 neurons. My training set consists of 2 examples, most of the inputs in each example are zeros. For some reason after the backpropagation training the network gives virtually the same output for both examples (which is either valid for only 1 of examples or have 1.0 for outputs where one of example has 1s). It comes to this state after the first epoch and doesn't change much afterwards, even if learning rate is minimal double vale. I use sigmoid as activation function. I thought it could be something wrong with my code so I've used AForge open source library, and seems like it suffers from the same issue. What might be the problem here?

Solution: I've removed one layer and decreased the number of neurons in hidden layer to 800

Natalia
  • 554
  • 1
  • 6
  • 16

2 Answers2

3

2000 by 2000 by 20 is huge. That's approximately 4 million weights to determine, meaning the algorithm has to search a 4-million-dimensional space. Any optimization algorithm will be totally at a loss in this case. I'm assuming you're using gradient descent, which is not even that powerful, so likely the algorithm is stuck in a local optimum somewhere in this gigantic search space.

Simplify your model!

Added:

And please also describe in more detail what you're trying to do. Do you really have only 2 training examples? That's like trying to categorize 2 points using a 4-million-dimensional plane. It doesn't make sense to me.

Def_Os
  • 5,301
  • 5
  • 34
  • 63
  • Decreasing number of neurons (20x20x20) doesn't seem to do any good (the result is about the same). So the problem is in huge number of inputs? But it's unavoidable in my case – Natalia Sep 27 '12 at 09:24
0

You mentioned that most of the inputs are zero. To your reduce the size of your search space, try removing redundancy in your training examples. For instance if

trainingExample[0].inputValue[i] == trainingExample[1].inputValue[i] 

then x.inputValue[i] has no information bearing data for the NN.

Also, perhaps it's not clear, but it seems that two training examples seem small.

Rob Leclerc
  • 898
  • 9
  • 11