Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.


Tag usage:

Questions on should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.


Read more:

1428 questions
-2
votes
1 answer

backpropagation with more than one node per layer

I read this article about how backpropagation works, and I understood everything they said. They said that to find the gradient we have to take a partial derivative of the cost function to each weight/bias. However, to explain this they used a…
-2
votes
1 answer

How learning rate influences gradient descent?

When gradient descent quantitatively suggests by much the biases and weights to be reduced, what does learning rate is doing?? Am a beginner, someone please enlighten me on this.
-2
votes
1 answer

Keras, Stochastic Gradient Descent - what do parameters mean

I don't know how in detail Stochastic Gradient Descent algorithm works and I don't need to know this at the moment. What I know is that it minimizes loss function by calculating gradients and going into direction of the local minimum. But I'm using…
Damian
  • 178
  • 11
-2
votes
2 answers

How to know correct learning rate for GradientDescentOptimizer in Tensorflow?

I am confused on learning rate of Gradient Descent Optimizer in Tensorflow, So suppose i am trying to predict next value from this data : x_data = [5,10,15,20,25,30,35,40] y_data = [2,4,6,8,10,12,14,16] If i choose learning rate as 0.01 , Here is…
-2
votes
1 answer

Should gradient-descent give exactly the same answer as a least-squares method for fitting a regression?

I.e. will the output of GD be an approximation to the LS-determined value, or are these equivalent problems with identical output? Does it perhaps depend on the type of regression: linear, logistic, etc.?
-2
votes
1 answer

Local minima in Backpropagation algorithm

The addition of an extra term, called a proportional factor reduces the convergence of the back propagation algorithm. So how to avoid local minima in Back propagation algorithm.
-2
votes
1 answer

Gradient descent : should delta value be scalar or vector?

When computing the delta values for a neural network after running back propagation : the value of delta(1) will be a scalar value, it should be a vector ? Update : Taken from…
blue-sky
  • 51,962
  • 152
  • 427
  • 752
-2
votes
1 answer

Python implementation of gradient descent (Machine Learning)

I have tried to implement gradient descent here in python but the cost J just seems to be increasing irrespective of lambda ans alpha value, i am unable to figure out what the issue over here is. It'll be great if someone can help me out with this.…
rohit
  • 83
  • 1
  • 2
  • 14
-3
votes
1 answer

Normaliztion in linear regression (gradient descent)

I am writing a simple (gradient descent) code for linear regression with multi variables data set, my problem was that when I was testing the code I noticed that the cost still decreasing after 5 million iterations which means that my learning rate…
-3
votes
1 answer

I am unable to get this gradient descent solution correct

Consider a linear-regression model with N=3 and D=1 with input-output pairs as follows: yl=22, x 1=1, y2=3, x2=1, y3=3, x3=2 What is the gradient of mean-square error (MSE) with respect to B1 (when Bo=0 and B1=1? Give your answer correct to two…
-3
votes
2 answers

When do weights stop updating?

I'm implementing gradient descent for an assignment and am confused about when the weights are suppose to stop updating. Do I stop updating the weights when they don't change very much, i.e. when the weighti - weightprevious i <= (some…
-3
votes
1 answer

When to use learning rate finder

Reading the paper ' Cyclical Learning Rates for Training Neural Networks' https://arxiv.org/abs/1506.01186 Does it make sense to use the learning rate finder if the model is over-fitting ? Other than reduce the number of iterations before the model…
blue-sky
  • 51,962
  • 152
  • 427
  • 752
-3
votes
2 answers

MLP with partial_fit() performing worse than with fit() in a supervised classification

The learning dataset I'm using is a grayscale image that was flatten to have each pixel representing an individual sample. The second image will be classified pixel by pixel after training the Multilayer perceptron (MLP) classifier on the former…
-4
votes
3 answers

In Neural Network, the Error doesn't increase after reaching its minimum value. Can u pls clarify

In gradient descent, we adjust weights to reach global minima of error. But, the hyperplane of gradient descent shows a boat like structure, which means after the error reaches its minimum value, it increases again to create the boat like…
-4
votes
1 answer

How to apply gradient descent on the weights of a neural network?

Considering a neural network with two hidden layers. In this case we have three matrices of weights. Lets say I'm starting the training. In the first round I'll set random values for all weights of the three matrices. If this is correct I have two…
1 2 3
95
96