Questions tagged [neural-network]

Network structure inspired by simplified models of biological neurons (brain cells). Neural networks are trained to "learn" by supervised and unsupervised techniques, and can be used to solve optimization problems, approximation problems, classify patterns, and combinations thereof.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise you're probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site?

Neural networks have many practical applications within the software realm.

An application of neural networks for supervised learning would be training a neural network for optical character recognition or handwriting recognition. The network would be trained on exemplars of characters, and given enough data which are a representative sample of the population, the network can generalize to a wider spectrum of cases that were not encountered during training. The procedure of training a neural network in a supervised learning manner involves a learning algorithm for finding the optimal weights of the neurons in the network that minimize its error at performing a task. Gradient Descent is an example for a learning algorithm common for adjusting the weights of a neural network. It is often accompanied by the backpropagation technique in order to measure the contribution of each weight to the error signal and determine the gradients that guides the learning algorithm in adjusting each weight.

For an example of a backpropagation network in action, see the source of GNU Backgammon

A frequently used network topology in unsupervised learning is the Self-Organizing Map, often attributed to Kohonen. These networks can be used for clustering data, and in general, providing a lower dimensional representation of a higher dimensional space.

See this code project article for an application of the Self-Organizing Map in clustering different images to find all of the unique faces.

Introductory Video

Neural Networks Demystified (Jupyter Notebooks)

Resources/ Recommendations

Neural Networks - Michael Nielsen

19989 questions
957
votes
18 answers

What is the role of the bias in neural networks?

I'm aware of the gradient descent and the back-propagation algorithm. What I don't get is: when is using a bias important and how do you use it? For example, when mapping the AND function, when I use two inputs and one output, it does not give the…
476
votes
14 answers

Epoch vs Iteration when training neural networks

What is the difference between epoch and iteration when training a multi-layer perceptron?
429
votes
10 answers

What is the meaning of the word logits in TensorFlow?

In the following TensorFlow function, we must feed the activation of artificial neurons in the final layer. That I understand. But I don't understand why it is called logits? Isn't that a mathematical function? loss_function =…
414
votes
3 answers

Keras input explanation: input_shape, units, batch_size, dim, etc

For any Keras layer (Layer class), can someone explain how to understand the difference between input_shape, units, dim, etc.? For example the doc says units specify the output shape of a layer. In the image of the neural net below hidden layer1…
scarecrow
  • 6,624
  • 5
  • 20
  • 39
395
votes
6 answers

What are advantages of Artificial Neural Networks over Support Vector Machines?

ANN (Artificial Neural Networks) and SVM (Support Vector Machines) are two popular strategies for supervised machine learning and classification. It's not often clear which method is better for a particular project, and I'm certain the answer is…
Channel72
  • 24,139
  • 32
  • 108
  • 180
331
votes
7 answers

Why do we need to call zero_grad() in PyTorch?

Why does zero_grad() need to be called during training? | zero_grad(self) | Sets gradients of all model parameters to zero.
user1424739
  • 11,937
  • 17
  • 63
  • 152
330
votes
2 answers

Extremely small or NaN values appear in training neural network

I'm trying to implement a neural network architecture in Haskell, and use it on MNIST. I'm using the hmatrix package for linear algebra. My training framework is built using the pipes package. My code compiles and doesn't crash. But the problem is,…
Charles Langlois
  • 4,198
  • 4
  • 16
  • 25
271
votes
3 answers

How to interpret loss and accuracy for a machine learning model

When I trained my neural network with Theano or Tensorflow, they will report a variable called "loss" per epoch. How should I interpret this variable? Higher loss is better or worse, or what does it mean for the final performance (accuracy) of my…
254
votes
10 answers

How do I initialize weights in PyTorch?

How do I initialize weights and biases of a network (via e.g. He or Xavier initialization)?
Fábio Perez
  • 23,850
  • 22
  • 76
  • 100
238
votes
8 answers

Ordering of batch normalization and dropout?

The original question was in regard to TensorFlow implementations specifically. However, the answers are for implementations in general. This general answer is also the correct answer for TensorFlow. When using batch normalization and dropout in…
golmschenk
  • 11,736
  • 20
  • 78
  • 137
227
votes
10 answers

Why use softmax as opposed to standard normalization?

In the output layer of a neural network, it is typical to use the softmax function to approximate a probability distribution: This is expensive to compute because of the exponents. Why not simply perform a Z transform so that all outputs are…
Tom
  • 6,601
  • 12
  • 40
  • 48
215
votes
12 answers

Why binary_crossentropy and categorical_crossentropy give different performances for the same problem?

I'm trying to train a CNN to categorize text by topic. When I use binary cross-entropy I get ~80% accuracy, with categorical cross-entropy I get ~50% accuracy. I don't understand why this is. It's a multiclass problem, doesn't that mean that I have…
209
votes
8 answers

Where do I call the BatchNormalization function in Keras?

If I want to use the BatchNormalization function in Keras, then do I need to call it once only at the beginning? I read this documentation for it: http://keras.io/layers/normalization/ I don't see where I'm supposed to call it. Below is my code…
pr338
  • 8,730
  • 19
  • 52
  • 71
192
votes
13 answers

Why must a nonlinear activation function be used in a backpropagation neural network?

I've been reading some things on neural networks and I understand the general principle of a single layer neural network. I understand the need for aditional layers, but why are nonlinear activation functions used? This question is followed by this…
corazza
  • 31,222
  • 37
  • 115
  • 186
188
votes
10 answers

Why do we have to normalize the input for an artificial neural network?

Why do we have to normalize the input for a neural network? I understand that sometimes, when for example the input values are non-numerical a certain transformation must be performed, but when we have a numerical input? Why the numbers must be in a…
karla
  • 4,506
  • 5
  • 34
  • 39
1
2 3
99 100