Questions tagged [deep-learning]

Deep Learning is an area of machine learning whose goal is to learn complex functions using special neural network architectures that are "deep" (consist of many layers). This tag should be used for questions about implementation of deep learning architectures. General machine learning questions should be tagged "machine learning". Including a tag for the relevant software library (e.g., "keras", "tensorflow","pytorch","fast.ai" etc) is helpful.

Deep Learning is a branch of aimed at building to learn complex functions using special neural network architectures with many layers (hence the term "deep").

Deep neural network architectures allow for more complex tasks to be learned because, in addition to these neural networks having more layers to perform transformations, the larger number of layers and more complex architectures of the neural network allow a hierarchical organization of functionality to emerge.

Deep Learning was introduced into machine learning research with the intention of moving machine learning closer to artificial intelligence. A significant impact of deep learning lies in feature learning, mitigating much of the effort going into manual feature engineering in non-deep learning neural networks.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise your question is probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

Resources

Papers

Books

Videos

Stack Exchange Sites

Other StackExchange sites with Deep Learning tag:

27406 questions
114
votes
4 answers

multi-layer perceptron (MLP) architecture: criteria for choosing number of hidden layers and size of the hidden layer?

If we have 10 eigenvectors then we can have 10 neural nodes in input layer.If we have 5 output classes then we can have 5 nodes in output layer.But what is the criteria for choosing number of hidden layer in a MLP and how many neural nodes in 1…
110
votes
4 answers

What's the difference between torch.stack() and torch.cat() functions?

OpenAI's REINFORCE and actor-critic example for reinforcement learning has the following code: REINFORCE: policy_loss = torch.cat(policy_loss).sum() actor-critic: loss = torch.stack(policy_losses).sum() + torch.stack(value_losses).sum() One is…
Gulzar
  • 23,452
  • 27
  • 113
  • 201
109
votes
5 answers

What's the difference between "hidden" and "output" in PyTorch LSTM?

I'm having trouble understanding the documentation for PyTorch's LSTM module (and also RNN and GRU, which are similar). Regarding the outputs, it says: Outputs: output, (h_n, c_n) output (seq_len, batch, hidden_size * num_directions): tensor…
N. Virgo
  • 7,970
  • 11
  • 44
  • 65
108
votes
6 answers

Keras, how do I predict after I trained a model?

I'm playing with the reuters-example dataset and it runs fine (my model is trained). I read about how to save a model, so I could load it later to use again. But how do I use this saved model to predict a new text? Do I use models.predict()? Do I…
bky
  • 1,314
  • 3
  • 11
  • 14
104
votes
4 answers

What does global_step mean in Tensorflow?

In this is tutorial code from TensorFlow website, could anyone help explain what does global_step mean? I found on the Tensorflow website written that global step is used count training steps, but I don't quite get what exactly it means. Also,…
GabrielChu
  • 6,026
  • 10
  • 27
  • 42
103
votes
4 answers

How to do gradient clipping in pytorch?

What is the correct way to perform gradient clipping in pytorch? I have an exploding gradients problem.
Gulzar
  • 23,452
  • 27
  • 113
  • 201
102
votes
2 answers

What is the intuition of using tanh in LSTM?

In an LSTM network (Understanding LSTMs), why does the input gate and output gate use tanh? What is the intuition behind this? It is just a nonlinear transformation? If it is, can I change both to another activation function (e.g., ReLU)?
102
votes
8 answers

How big should batch size and number of epochs be when fitting a model?

My training set has 970 samples and validation set has 243 samples. How big should batch size and number of epochs be when fitting a model to optimize the val_acc? Is there any sort of rule of thumb to use based on data input size?
pr338
  • 8,730
  • 19
  • 52
  • 71
101
votes
3 answers

What is the difference between sparse_categorical_crossentropy and categorical_crossentropy?

What is the difference between sparse_categorical_crossentropy and categorical_crossentropy? When should one loss be used as opposed to the other? For example, are these losses suitable for linear regression?
xpertdev
  • 1,293
  • 2
  • 6
  • 12
101
votes
6 answers

Using a pre-trained word embedding (word2vec or Glove) in TensorFlow

I've recently reviewed an interesting implementation for convolutional text classification. However all TensorFlow code I've reviewed uses a random (not pre-trained) embedding vectors like the following: with tf.device('/cpu:0'),…
user3147590
  • 1,231
  • 2
  • 10
  • 16
99
votes
10 answers

Does Any one got "AttributeError: 'str' object has no attribute 'decode' " , while Loading a Keras Saved Model

After Training, I saved Both Keras whole Model and Only Weights using model.save_weights(MODEL_WEIGHTS) and model.save(MODEL_NAME) Models and Weights were saved successfully and there was no error. I can successfully load the weights simply using…
Rizwan
  • 1,210
  • 2
  • 9
  • 21
97
votes
4 answers

How to stack multiple lstm in keras?

I am using deep learning library keras and trying to stack multiple LSTM with no luck. Below is my code model = Sequential() model.add(LSTM(100,input_shape =(time_steps,vector_size))) model.add(LSTM(100)) The above code returns error in the third…
Tamim Addari
  • 7,591
  • 9
  • 40
  • 59
96
votes
10 answers

How to add regularizations in TensorFlow?

I found in many available neural network code implemented using TensorFlow that regularization terms are often implemented by manually adding an additional term to loss value. My questions are: Is there a more elegant or recommended way of…
Lifu Huang
  • 11,930
  • 14
  • 55
  • 77
94
votes
2 answers

how to format the image data for training/prediction when images are different in size?

I am trying to train my model which classifies images. The problem I have is, they have different sizes. how should i format my images/or model architecture ?
Asif Mohammed
  • 1,323
  • 1
  • 15
  • 29
93
votes
5 answers

Calculate the output size in convolution layer

How do I calculate the output size in a convolution layer? For example, I have a 2D convolution layer that takes a 3x128x128 input and has 40 filters of size 5x5.
Monk247uk
  • 1,170
  • 1
  • 8
  • 15