2

Regularly, a simple neural network to solve XOR should have 2 inputs, 2 neurons in hidden layer, 1 neuron in output layer.

However, the following example implementation has 2 output neurons, and I don't get it:

https://github.com/deeplearning4j/dl4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/feedforward/xor/XorExample.java

Why did the author put 2 output neurons in there?

Edit: Author of the example noted that he is using 4 neurons in hidden layer, 2 neurons in output layer. But I still don't get it why, why a shape of {4,2} instead of {2,1}?

Dee
  • 7,455
  • 6
  • 36
  • 70
  • 2
    He explained it in the comments at the top. (it's another question how good this explanation is in regards to formal math) – sascha May 11 '17 at 23:24
  • 1
    For all future questions, JFYI, there's an active dev community on the Gitter channel: https://gitter.im/deeplearning4j/deeplearning4j – racknuf May 12 '17 at 23:55
  • yeah, that chat room is interesting, some guy helped me out how to match activation function with loss function – Dee May 13 '17 at 00:10

3 Answers3

3

This is called one hot encoding. The idea is that you have one neuron per class. Each neuron gives the probability of that class.

I don't know why he uses 4 hidden neurons. 2 should be enough (if I remember correctly).

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • i find one-hot encoding good only when we don't have too many classes to classify – Dee May 12 '17 at 07:34
  • yeah, i dont know why he's using 4 hidden neurons too, i changed to 2 and it's still working perfectly! – Dee May 12 '17 at 07:38
  • coz there would be too many one-hot neurons in the output layer, i don't know but what if we need to classify that many – Dee May 12 '17 at 10:12
  • 1
    @johnlowvale I've never seen anything else. The largest number of classes I'm aware of is 1000 for ImageNet. One-hot encoding is no problem there. – Martin Thoma May 12 '17 at 10:36
  • It caused me some confusion too. The first two numbers are position in 4 by 2 table. The third number (off on its own), is the value to be place at that coordinate. – Adam Gerard Mar 30 '19 at 05:41
1

The author uses the Evaluation class in the end (for stats of how often the network gives the correct result). This class needs one neuron per classification to work correctly, i.e. one output neuron for true and one for false.

Shaido
  • 27,497
  • 23
  • 70
  • 73
  • 1
    i asked 1 guy on deeplearning4j chat room, he said it's because of the softmax activation function at output layer – Dee May 12 '17 at 07:35
0

It might be helpful to think of it like this:

Training Set        Label Set

    0 | 1               0 | 1
0 | 0 | 0          0 |  0 | 1
1 | 1 | 0          1 |  1 | 0
2 | 0 | 1          2 |  1 | 0
3 | 1 | 1          3 |  0 | 1

So [[0,0], 0], [[0,1], 0], etc. for the Training Set.

If you're using the two column Label Set, 0 and 1 correspond to true or false.

Thus, [0,0] correctly maps to false, [1,0] correctly maps to true, etc.

A pretty good article that slightly modifies the original can be found here: https://medium.com/autonomous-agents/how-to-teach-logic-to-your-neuralnetworks-116215c71a49

Adam Gerard
  • 708
  • 2
  • 8
  • 23