Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.


Tag usage:

Questions on should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.


Read more:

1428 questions
37
votes
2 answers

What is `lr_policy` in Caffe?

I just try to find out how I can use Caffe. To do so, I just took a look at the different .prototxt files in the examples folder. There is one option I don't understand: # The learning rate policy lr_policy: "inv" Possible values seem to…
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
31
votes
2 answers

Understanding accumulated gradients in PyTorch

I am trying to comprehend inner workings of the gradient accumulation in PyTorch. My question is somewhat related to these two: Why do we need to call zero_grad() in PyTorch? Why do we need to explicitly call zero_grad()? Comments to the accepted…
VikingCat
  • 413
  • 1
  • 4
  • 7
28
votes
1 answer

Backward function in PyTorch

I have some question about pytorch's backward function I don't think I'm getting the right output : import numpy as np import torch from torch.autograd import Variable a = Variable(torch.FloatTensor([[1,2,3],[4,5,6]]), requires_grad=True) out = a *…
Elin
  • 305
  • 1
  • 4
  • 7
28
votes
4 answers

scipy.optimize.fmin_l_bfgs_b returns 'ABNORMAL_TERMINATION_IN_LNSRCH'

I am using scipy.optimize.fmin_l_bfgs_b to solve a gaussian mixture problem. The means of mixture distributions are modeled by regressions whose weights have to be optimized using EM algorithm. sigma_sp_new, func_val, info_dict =…
27
votes
9 answers

gradient descent seems to fail

I implemented a gradient descent algorithm to minimize a cost function in order to gain a hypothesis for determining whether an image has a good quality. I did that in Octave. The idea is somehow based on the algorithm from the machine learning…
Tyzak
  • 2,430
  • 8
  • 38
  • 52
26
votes
2 answers

What is `weight_decay` meta parameter in Caffe?

Looking at an example 'solver.prototxt', posted on BVLC/caffe git, there is a training meta parameter weight_decay: 0.04 What does this meta parameter mean? And what value should I assign to it?
Shai
  • 111,146
  • 38
  • 238
  • 371
25
votes
1 answer

What type of orthogonal polynomials does R use?

I was trying to match the orthogonal polynomials in the following code in R: X <- cbind(1, poly(x = x, degree = 9)) but in python. To do this I implemented my own method for giving orthogonal polynomials: def get_hermite_poly(x,degree): …
Charlie Parker
  • 5,884
  • 57
  • 198
  • 323
23
votes
2 answers

How to accumulate gradients in tensorflow?

I have a question similar to this one. Because I have limited resources and I work with a deep model (VGG-16) - used to train a triplet network - I want to accumulate gradients for 128 batches of size one training example, and then propagate the…
Hello Lili
  • 1,527
  • 1
  • 25
  • 50
23
votes
3 answers

Simple Linear Regression in Python

I am trying to implement this algorithm to find the intercept and slope for single variable: Here is my Python code to update the Intercept and slope. But it is not converging. RSS is Increasing with Iteration rather than decreasing and after some…
19
votes
1 answer

Scipy sparse CSR matrix to TensorFlow SparseTensor - Mini-Batch gradient descent

I have a Scipy sparse CSR matrix created from sparse TF-IDF feature matrix in SVM-Light format. The number of features is huge and it is sparse so I have to use a SparseTensor or else it is too slow. For example, number of features is 5, and a…
Salman Mohammed
  • 269
  • 1
  • 3
  • 9
19
votes
3 answers

Gradient Descent with constraints (lagrange multipliers)

I'm trying to find the min of a function in N parameters using gradient descent. However I want to do that while limiting the sum of absolute values of the parameters to be 1 (or <= 1, doesn't matter). For this reason I am using the method of…
nickb
  • 882
  • 3
  • 8
  • 22
18
votes
3 answers

Cost function in logistic regression gives NaN as a result

I am implementing logistic regression using batch gradient descent. There are two classes into which the input samples are to be classified. The classes are 1 and 0. While training the data, I am using the following sigmoid function: t = 1 ./ (1 +…
17
votes
1 answer

Custom loss function in Keras to penalize false negatives

I am working on a medical dataset where I am trying to have as less false negatives as possible. A prediction of "disease when actually no disease" is okay for me but a prediction "no disease when actually a disease" is not. That is, I am okay with…
15
votes
1 answer

How to interpret caffe log with debug_info?

When facing difficulties during training (nans, loss does not converge, etc.) it is sometimes useful to look at more verbose training log by setting debug_info: true in the 'solver.prototxt' file. The training log then looks something like: I1109…
Shai
  • 111,146
  • 38
  • 238
  • 371
15
votes
2 answers

Caffe: What can I do if only a small batch fits into memory?

I am trying to train a very large model. Therefore, I can only fit a very small batch size into GPU memory. Working with small batch sizes results with very noisy gradient estimations. What can I do to avoid this problem?
Shai
  • 111,146
  • 38
  • 238
  • 371
1
2
3
95 96