I am trying to train a fully connected neural network to classify hand-written number using MNIST datasets. The neural network is implemented by me in C++. This is part of my course project. However, I find the training is somehow weird. I do not know what is going wrong.
My course instructor asks us to use Sigmoid as activation function and MSE as loss function, even for the output layer. I have some doubt whether it is the correct choice. But I still follow what he said.
My network structure is:
28*28 (input layer) value: 0-1
|
|
500 (hidden layer) activation: sigmoid
|
|
10 (output layer) activation: sigmoid
|
|
loss: MSE
The learning rate is 0.4. Batch-size is 100. The loss quickly drops to 0.5, but will not reduce anymore. The testing accuracy ascends to 60% and will not increase any more.
I wonder whether I implement the neural network wrong or I should not use Sigmoid and MSE. Thank you!