-1

I was recently learning about neural networks and came across MNIST data set. i understood that a sigmoid cost function is used to reduce the loss. Also, weights and biases gets adjusted and an optimum weights and biases are found after the training. the thing i did not understand is, on what basis the images are classified. For example, to classify whether a patient has cancer or not, data like age, location, etc., becomes features. in MNIST dataset, i did not find any of that. Am i missing something here. Please help me with this

rawwar
  • 4,834
  • 9
  • 32
  • 57
  • 2
    There is probably more than *one* MNIST dataset. But be prepared to see that the pixels are features, as usual in end-to-end learning (also: the optimum is not necessarily found; theoretical guarantees are not common in NNs). – sascha Nov 14 '17 at 05:26
  • 2
    You should go through this http://cs231n.github.io/convolutional-networks/ – Pramod Patil Nov 14 '17 at 06:09

1 Answers1

2

First of all the Network pipeline consists of 3 main parts:

  • Input Manipulation:
  • Parameters that effect the finding of minimum:
  • Parameters like your descission function in your interpretation layer (often fully connected layer)

In contrast to your regular machine learning pipeline where you have to extract features manually a CNN uses filters. (Filters like in edge detection or viola and jones).

If a filter runs across the images and is convolved with pixels it Produces an output.

This output is then interpreted by a neuron. If the output is above a threshold it is considered as valid (Step function counts 1 if valid or in case of Sigmoid it has a value on the sigmoid function).

The next steps are the same as before.

This is progressed until the interpretation layer (often softmax). This layer interprets your computation (if the filters are good adapted to your problem you will get a good predicted label) which means you have a low difference between (y_guess - y_true_label).

Now you can see that for the guess of y we have multiplied the input x with many weights w and also used functions on it. This can be seen like a chain rule in analysis.

To get better results the effect of a single weight on the input must be known. Therefore, you use Backpropagation which is a derivative of the Error with respect to all w. The Trick is that you can reuse derivatives which is more or less Backpropagation and it becomes easier since you can use Matrix vector notation.

If you have your gradient, you can use the normal concept of minimization where you walk along the steepest descent. (There are also many other gradient methods like adagrad or adam etc).

The steps will repeat until convergence or until you reach the maximum epochs.

So the answer is: THE COMPUTED WEIGHTS (FILTERS) ARE THE KEY TO DETECT NUMBERS AND DIGITS :)

Max Krappmann
  • 490
  • 5
  • 19