Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.

Tag usage:

Questions on gradient-descent should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

Read more:

1428 questions

votes

1 answer

How to obtain the convex curve for weights vs loss in a neural network

In most of the literature of Neural networks the 3D plot of weights, bias and the loss function is shown as below, When I tried I obtained a plot like this one Here are more details, Here is the glimpse of the dataset, there are 15,000 training…

python machine-learning neural-network deep-learning gradient-descent

asked Jan 23 '18 at 11:30

Karthic Rao

3,624
8
30
44

votes

1 answer

Logistic Regression Gradient Descent

I have to do Logistic regression using batch gradient descent. import numpy as np X = np.asarray([ [0.50],[0.75],[1.00],[1.25],[1.50],[1.75],[1.75], [2.00],[2.25],[2.50],[2.75],[3.00],[3.25],[3.50], [4.00],[4.25],[4.50],[4.75],[5.00],[5.50]]) y =…

python machine-learning logistic-regression gradient-descent

asked Dec 13 '17 at 14:50

Sean

votes

1 answer

The gradient of an output w.r.t network weights that holds another output constant

Let's assume I have a simple MLP And I have a gradient of some loss function with respect to the output layer to get G = [0, -1] (that is, increasing the second output variable decreases the loss function). If I take the gradient of G with respect…

tensorflow neural-network gradient-descent

asked Feb 11 '17 at 22:58

Robert

1,132
2
11
26

votes

1 answer

Steepest descent spitting out unreasonably large values

My implementation of steepest descent for solving Ax = b is showing some weird behavior: for any matrix large enough (~10 x 10, have only tested square matrices so far), the returned x contains all huge values (on the order of 1x10^10). def…

python numpy mathematical-optimization numerical-methods gradient-descent

asked Jul 26 '16 at 15:38

Cole Zimmerman

votes

1 answer

The grad function in both the {pracma} and the {numDeriv} libraries of R gives erroneous results

I am interested in the 1st order numerical derivative of a self-defined function pTgh_y(q,g,h) with respect to q. For a special case, pTgh_y(q,0,0) = pnorm(q). In other words pTgh_y(q,g,h) is reduced to the CDF of the standard normal when g=h=0 (see…

r gradient-descent

asked Feb 24 '16 at 23:26

Ye Tian

votes

1 answer

What's different about momentum gradient update in Tensorflow and Theano like this?

I'm trying to use TensorFlow with my deep learning project. Here I need implement my gradient update in this formula : I have also implement this part in Theano, and it came out the expected answer. But when I try to use TensorFlow's…

tensorflow gradient-descent momentum

asked Feb 18 '16 at 17:10

Peter Yang

votes

3 answers

Will larger batch size make computation time less in machine learning?

I am trying to tune the hyper parameter i.e batch size in CNN.I have a computer of corei7,RAM 12GB and i am training a CNN network with CIFAR-10 dataset which can be found in this blog.Now At first what i have read and learnt about batch size in…

machine-learning neural-network conv-neural-network torch gradient-descent

asked Feb 02 '16 at 16:12

Setu Kumar Basak

11,460
9
53
85

votes

4 answers

TensorFlow's ReluGrad claims input is not finite

I'm trying out TensorFlow and I'm running into a strange error. I edited the deep MNIST example to use another set of images, and the algorithm converges nicely again, until around iteration 8000 (accuracy 91% at that point) when it crashes with the…

gradient-descent tensorflow

asked Nov 13 '15 at 18:07

user1111929

6,050
9
43
73

votes

4 answers

Gradient descent and normal equation method for solving linear regression gives different solutions

I'm working on machine learning problem and want to use linear regression as learning algorithm. I have implemented 2 different methods to find parameters theta of linear regression model: Gradient (steepest) descent and Normal equation. On the same…

matlab machine-learning linear-regression gradient-descent

asked Jun 30 '12 at 04:08

Rasto

17,204
47
154
245

votes

1 answer

Why does Pytorch autograd need a scalar?

I am working through "Deep Learning for Coders with fastai & Pytorch". Chapter 4 introduces the autograd function from the PyTorch library on a trivial example. x = tensor([3.,4.,10.]).requires_grad_() def f(q): return sum(q**2) y =…

python pytorch gradient-descent fast-ai

asked Jul 26 '21 at 21:14

Mack

votes

1 answer

Gradient descent for ridge regression

I'm trying to write a code that return the parameters for ridge regression using gradient descent. Ridge regression is defined as Where, L is the loss (or cost) function. w are the parameters of the loss function (which assimilates b). x are the…

python numpy machine-learning gradient-descent

asked Jan 26 '21 at 21:49

immb31

votes

2 answers

Gradient descent using TensorFlow is much slower than a basic Python implementation, why?

I'm following a machine learning course. I have a simple linear regression (LR) problem to help me get used to TensorFlow. The LR problem is to find parameters a and b such that Y = a*X + b approximates an (x, y) point cloud (which I generated…

python python-3.x tensorflow linear-regression gradient-descent

asked Dec 29 '20 at 12:49

Stefan

votes

2 answers

Why is softmax classifier gradient divided by batch size (CS231n)?

Question In CS231 Computing the Analytic Gradient with Backpropagation which is first implementing a Softmax Classifier, the gradient from (softmax + log loss) is divided by the batch size (number of data being used in a cycle of forward cost…

python machine-learning gradient-descent softmax

asked Dec 13 '20 at 12:20

mon

18,789
22
112
205

votes

1 answer

Why is Loss of SGD for a dataset is not matching the pytorch code with the scratch python code for linear regression?

I'm trying to implement Multiple Linear regression on the wine dataset. But when I compare the results of Pytorch with scratch code of Python the losses are not coming same. My Scratch Code: Functions: def yinfer(X, beta): return beta[0] +…

python pytorch linear-regression gradient gradient-descent

asked Sep 22 '20 at 18:53

Rest1ve

votes

2 answers

If we can clip gradient in WGAN, why bother with WGAN-GP?

I am working on WGAN and would like to implement WGAN-GP. In its original paper, WGAN-GP is implemented with a gradient penalty because of the 1-Lipschitiz constraint. But packages out there like Keras can clip the gradient norm at 1 (which by…

machine-learning gradient-descent generative-adversarial-network

asked Nov 06 '19 at 05:51

lwang024

Prev 1 2 3

…

95 96 Next