-3

I am trying to train a fully connected neural network to classify hand-written number using MNIST datasets. The neural network is implemented by me in C++. This is part of my course project. However, I find the training is somehow weird. I do not know what is going wrong.

My course instructor asks us to use Sigmoid as activation function and MSE as loss function, even for the output layer. I have some doubt whether it is the correct choice. But I still follow what he said.

My network structure is:

28*28 (input layer) value: 0-1
|
|
500 (hidden layer) activation: sigmoid
|
|
10 (output layer) activation: sigmoid
|
|
loss: MSE

The learning rate is 0.4. Batch-size is 100. The loss quickly drops to 0.5, but will not reduce anymore. The testing accuracy ascends to 60% and will not increase any more.

I wonder whether I implement the neural network wrong or I should not use Sigmoid and MSE. Thank you!

Ronghao
  • 1
  • 2

1 Answers1

0

Change your activation and loss function.
Sigmoid is used in output layer for binary classification but MNIST is multi class classification problem. And use categorical crossenropy. MSE (mean squared error) is used for regression.