2

I currently understand and made a simple neural network which solves the XOR problem. I want to make a neural network for digit recognition. I know using MNIST data I would need 784 input neurons, 15 hidden neurons and 10 output neurons (0-9).

However, I don’t understand how the network would be trained and how feed forward would work with multiple output neurons.

For example, if the input was the pixels for the digit 3, how would the network determine which output neuron is picked and when training, how would the network know which neuron should be associated with the target value.

Any help would be appreciated.

DadeKuma
  • 62
  • 8
Conor
  • 45
  • 5

1 Answers1

3

So you have a classification problem with multiple outputs. I'm supposing that you are using a softmax activation function for the output layer.

How the network determines which output neuron is picked: simple, the output neuron with the greatest probability of being the target class.

The network would be trained with standard backpropagation, same algorithm that you would have with only one output.

There is only one difference: the activation function. For binary classification you need only one output (for example with digits 0 and 1, if probability < 0.5 then class is 0, else 1).

For multi-class classification you need an output node for each class; then the network will pick the node with the greatest probability of being the target class.

DadeKuma
  • 62
  • 8
  • Awesome! If I was using a sigmoid function on the output layer would it still work, if so how would it work? and if not why won’t it work ? – Conor Feb 26 '19 at 00:36
  • It would work. Basically if you use a softmax function the sum of probabilities on the output layer will be always 1. If you use a sigmoid function this doesn't always happen. This is one of the main reasons for which softmax is preferred to sigmoid, in multi-class classification. – DadeKuma Feb 26 '19 at 00:45
  • Ahhh right, thank you for the help, it was very useful. – Conor Feb 26 '19 at 00:47
  • Also I'm supposing that the input corresponds to only **one** class. If your outputs are indipendent (i.e the input can correspond to multiple classes at the same time) it's better to use sigmoids instead. In your case the input can be only one class. – DadeKuma Feb 26 '19 at 00:49