Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.

Tag usage:

Questions on gradient-descent should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

Read more:

1428 questions

votes

2 answers

Vectorization of a gradient descent code

I am implementing a batch gradient descent on Matlab. I have a problem with the update step of theta. theta is a vector of two components (two rows). X is a matrix containing m rows (number of training samples) and n=2 columns (number of…

matlab gradient-descent

asked Dec 22 '13 at 23:59

bigTree

2,103
6
29
45

votes

2 answers

Gradient descent convergence How to decide convergence?

I learnt gradient descent through online resources (namely machine learning at coursera). However the information provided only said to repeat gradient descent until it converges. Their definition of convergence was to use a graph of the cost…

machine-learning gradient-descent

asked Jun 25 '13 at 04:16

Terence Chow

10,755
24
78
141

votes

1 answer

Suboptimal convergence in PyTorch compared to TensorFlow when using Adam optimizer

My program for training a model in PyTorch converges worse than the TensorFlow implementation. When I switch to SGD instead of Adam, the losses are identical. With Adam, the losses are different starting at the very first epoch. I believe I'm using…

tensorflow deep-learning pytorch gradient-descent

asked May 24 '21 at 20:56

FFT

votes

2 answers

why too many epochs will cause overfitting?

I am reading the a deep learning with python book. After reading chapter 4, Fighting Overfitting, I have two questions. Why might increasing the number of epochs cause overfitting? I know increasing increasing the number of epochs will involve…

machine-learning gradient-descent

asked Dec 27 '18 at 09:22

NingLee

1,477
2
17
26

votes

1 answer

Using R for multi-class logistic regression

Short format: How to implement multi-class logistic regression classification algorithms via gradient descent in R? Can optim() be used when there are more than two labels? The MatLab code is: function [J, grad] = cost(theta, X, y, lambda) m =…

r matlab logistic-regression gradient-descent

asked May 31 '16 at 11:47

Antoni Parellada

4,253
6
49
114

votes

5 answers

What are alternatives of Gradient Descent?

Gradient Descent has a problem of Local Minima. We need run gradient descent exponential times for to find global minima. Can anybody tell me about any alternatives of gradient descent with their pros and cons. Thanks.

machine-learning neural-network logistic-regression gradient-descent

asked May 08 '14 at 23:57

Nusrat

votes

2 answers

Difference between autograd.grad and autograd.backward?

Suppose I have my custom loss function and I want to fit the solution of some differential equation with help of my neural network. So in each forward pass, I am calculating the output of my neural net and then calculating the loss by taking the MSE…

pytorch gradient gradient-descent backpropagation autograd

asked Sep 12 '21 at 05:08

unstableEquilibrium

votes

1 answer

Full gradient descent in keras

I am trying to implement full gradient descent in keras. This means that for each epoch I am training on the entire dataset. This is why the batch size is defined to be the length size of the training set. from keras.models import Sequential from…

python machine-learning keras deep-learning gradient-descent

asked Dec 13 '18 at 20:24

user552231

1,095
3
21
40

votes

1 answer

What's the triplet loss back propagation gradient formula?

I am trying to use caffe to implement triplet loss described in Schroff, Kalenichenko and Philbin "FaceNet: A Unified Embedding for Face Recognition and Clustering", 2015. I am new to this so how to calculate the gradient in back propagation?

computer-vision neural-network deep-learning caffe gradient-descent

asked Oct 25 '15 at 14:25

Mickey Shine

12,187
25
96
148

votes

3 answers

Gradient descent in Java

I've recently started the AI-Class at Coursera and I've a question related to my implementation of the gradient descent algorithm. Here's my current implementation (I actually just "translated" the mathematical expressions into Java code): …

java artificial-intelligence gradient-descent

asked Aug 23 '15 at 18:27

Bastian

1,553
13
33

votes

4 answers

Are there alternatives to backpropagation?

I know a neural network can be trained using gradient descent and I understand how it works. Recently, I stumbled upon other training algorithms: conjugate gradient and quasi-Newton algorithms. I tried to understand how they work but the only good…

neural-network backpropagation gradient-descent

asked Mar 21 '19 at 18:28

Nope

votes

3 answers

How to get around in place operation error if index leaf variable for gradient update?

I am encountering In place operation error when I am trying to index a leaf variable to update gradients with customized Shrink function. I cannot work around it. Any help is highly appreciated! import torch.nn as nn import torch import numpy as…

python neural-network deep-learning gradient-descent pytorch

asked Mar 07 '18 at 21:39

W.S.

votes

2 answers

How to determine the learning rate and the variance in a gradient descent algorithm？

I started to learn the machine learning last week. when I want to make a gradient descent script to estimate the model parameters, I came across a problem: How to choose a appropriate learning rate and variance。I found that，different (learning…

python machine-learning gradient-descent

asked May 19 '13 at 23:13

zhoufanking

votes

1 answer

R: implementing my own gradient boosting algorithm

I am trying to write my own gradient boosting algorithm. I understand there are existing packages like gbm and xgboost, but I wanted to understand how the algorithm works by writing my own. I am using the iris data set, and my outcome is…

r machine-learning gradient-descent gbm boosting

asked Apr 02 '20 at 22:29

Adrian

9,229
24
74
132

votes

3 answers

Tensorflow 2.0 doesn't compute the gradient

I want to visualize the patterns that a given feature map in a CNN has learned (in this example I'm using vgg16). To do so I create a random image, feed through the network up to the desired convolutional layer, choose the feature map and find the…

python tensorflow conv-neural-network gradient-descent

asked Jul 06 '19 at 17:38

Will

Prev 1 2

…

95 96 Next