Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.

Tag usage:

Questions on gradient-descent should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

Read more:

1428 questions

votes

2 answers

What is `lr_policy` in Caffe?

I just try to find out how I can use Caffe. To do so, I just took a look at the different .prototxt files in the examples folder. There is one option I don't understand: # The learning rate policy lr_policy: "inv" Possible values seem to…

machine-learning neural-network deep-learning caffe gradient-descent

asked May 04 '15 at 14:47

Martin Thoma

124,992
159
614
958

votes

2 answers

Understanding accumulated gradients in PyTorch

I am trying to comprehend inner workings of the gradient accumulation in PyTorch. My question is somewhat related to these two: Why do we need to call zero_grad() in PyTorch? Why do we need to explicitly call zero_grad()? Comments to the accepted…

python deep-learning pytorch gradient-descent

asked May 28 '20 at 14:35

VikingCat

votes

1 answer

Backward function in PyTorch

I have some question about pytorch's backward function I don't think I'm getting the right output : import numpy as np import torch from torch.autograd import Variable a = Variable(torch.FloatTensor([[1,2,3],[4,5,6]]), requires_grad=True) out = a *…

machine-learning pytorch gradient-descent autograd

asked Jul 29 '19 at 07:13

Elin

votes

4 answers

scipy.optimize.fmin_l_bfgs_b returns 'ABNORMAL_TERMINATION_IN_LNSRCH'

I am using scipy.optimize.fmin_l_bfgs_b to solve a gaussian mixture problem. The means of mixture distributions are modeled by regressions whose weights have to be optimized using EM algorithm. sigma_sp_new, func_val, info_dict =…

optimization machine-learning statistics normal-distribution gradient-descent

asked Jan 07 '16 at 19:27

Munichong

3,861
14
48
69

votes

9 answers

gradient descent seems to fail

I implemented a gradient descent algorithm to minimize a cost function in order to gain a hypothesis for determining whether an image has a good quality. I did that in Octave. The idea is somehow based on the algorithm from the machine learning…

machine-learning octave gradient-descent

asked May 07 '12 at 09:02

Tyzak

2,430
8
38
52

votes

2 answers

What is `weight_decay` meta parameter in Caffe?

Looking at an example 'solver.prototxt', posted on BVLC/caffe git, there is a training meta parameter weight_decay: 0.04 What does this meta parameter mean? And what value should I assign to it?

machine-learning neural-network deep-learning caffe gradient-descent

asked Aug 24 '15 at 08:33

Shai

111,146
38
238
371

votes

1 answer

What type of orthogonal polynomials does R use?

I was trying to match the orthogonal polynomials in the following code in R: X <- cbind(1, poly(x = x, degree = 9)) but in python. To do this I implemented my own method for giving orthogonal polynomials: def get_hermite_poly(x,degree): …

python r machine-learning gradient-descent polynomial-math

asked Dec 04 '17 at 07:04

Charlie Parker

5,884
57
198
323

votes

2 answers

How to accumulate gradients in tensorflow?

I have a question similar to this one. Because I have limited resources and I work with a deep model (VGG-16) - used to train a triplet network - I want to accumulate gradients for 128 batches of size one training example, and then propagate the…

python tensorflow conv-neural-network gradient-descent

asked Oct 16 '17 at 14:26

Hello Lili

1,527
1
25
50

votes

3 answers

Simple Linear Regression in Python

I am trying to implement this algorithm to find the intercept and slope for single variable: Here is my Python code to update the Intercept and slope. But it is not converging. RSS is Increasing with Iteration rather than decreasing and after some…

python numpy machine-learning linear-regression gradient-descent

asked Jan 10 '16 at 12:45

Kazi Nazmul Haque Shezan

votes

1 answer

Scipy sparse CSR matrix to TensorFlow SparseTensor - Mini-Batch gradient descent

I have a Scipy sparse CSR matrix created from sparse TF-IDF feature matrix in SVM-Light format. The number of features is huge and it is sparse so I have to use a SparseTensor or else it is too slow. For example, number of features is 5, and a…

scipy tensorflow sparse-matrix gradient-descent

asked Nov 30 '16 at 19:01

Salman Mohammed

votes

3 answers

Gradient Descent with constraints (lagrange multipliers)

I'm trying to find the min of a function in N parameters using gradient descent. However I want to do that while limiting the sum of absolute values of the parameters to be 1 (or <= 1, doesn't matter). For this reason I am using the method of…

machine-learning gradient-descent

asked Sep 05 '12 at 15:17

nickb

votes

3 answers

Cost function in logistic regression gives NaN as a result

I am implementing logistic regression using batch gradient descent. There are two classes into which the input samples are to be classified. The classes are 1 and 0. While training the data, I am using the following sigmoid function: t = 1 ./ (1 +…

matlab machine-learning classification logistic-regression gradient-descent

asked Feb 15 '16 at 21:58

Neel Shah

votes

1 answer

Custom loss function in Keras to penalize false negatives

I am working on a medical dataset where I am trying to have as less false negatives as possible. A prediction of "disease when actually no disease" is okay for me but a prediction "no disease when actually a disease" is not. That is, I am okay with…

machine-learning keras deep-learning gradient-descent loss-function

asked Oct 08 '18 at 05:33

Swapnil B.

votes

1 answer

How to interpret caffe log with debug_info?

When facing difficulties during training (nans, loss does not converge, etc.) it is sometimes useful to look at more verbose training log by setting debug_info: true in the 'solver.prototxt' file. The training log then looks something like: I1109…

machine-learning neural-network deep-learning caffe gradient-descent

asked Nov 09 '16 at 15:37

Shai

111,146
38
238
371

votes

2 answers

Caffe: What can I do if only a small batch fits into memory?

I am trying to train a very large model. Therefore, I can only fit a very small batch size into GPU memory. Working with small batch sizes results with very noisy gradient estimations. What can I do to avoid this problem?

machine-learning neural-network deep-learning caffe gradient-descent

asked Apr 10 '16 at 07:11

Shai

111,146
38
238
371

Prev 1

…

95 96 Next