Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.


Tag usage:

Questions on should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.


Read more:

1428 questions
-1
votes
1 answer

Gradient vanishes when using batch normalization in caffe

all I run into problems when I use batch normalization in Caffe. Here is the code I used in train_val.prototxt. layer { name: "conv1" type: "Convolution" bottom: "conv0" top: "conv1" param { lr_mult: 1 …
-1
votes
1 answer

Gradient-descent code error

I wrote the code below for gradient descent algorithm. I get an error, can anyone tell me why and how I can fix it? gradient <- function(h, start, alpha = 0.01, tolerance = 0.0001, debug = FALSE) { MAXITER <- 1000 x_old <- start …
Rilja
  • 37
  • 5
-1
votes
1 answer

When should I use linear neural networks and when non-linear?

I am using feed forward, gradient descent backpropagation neural networks. Currently I have only worked with non-linear networks where tanh is activation function. I was wondering. What kind of tasks would you give to a neural networks with…
-1
votes
1 answer

What is the correct approach to update weights vectors in neural network?

I don't know how to update weights in Neural Network back-propagation. I train 18 tuples data. When should update the weight? After all tuples are calculated or one tuple is calculated? Please, help me. Thank in advance.
-1
votes
1 answer

How do I get backpropagation to work for a MLP? MATLAB

I am trying to get an MLP to work. My goal is to get the net to predict output Yt when given Yt-1,Yt-2...,Yt-10. I've been using a generated dataset, which should be no trouble. My net will always output a straight line and will shift that line up…
-2
votes
1 answer

Coefficient for the gradient term in stochastic gradient descent (SGD) with momentum

I'm studying SGD with momentum and have come across two versions of the update formula. The first is from a wiki: dw = a * dw - lr * dL/dw # w: weights; lr: learning rate; dL/dw: drivatives of loss function over w w := w + dw The second version is…
-2
votes
2 answers

overflow encountered in scalar power error (Linear Regression & Gradient Descent With Large Digit)

So i was trying out a manual gradient descent with a large digit and got overflow encountered in scalar power I use this dataset from kaggle to calculate land price X = LT, Y = Harga…
-2
votes
1 answer

Too accurate to be true! (linear regression)

I have been trying to code a gradient descent algorithm from scratch for multi-featured linear regression but when I'm predicting using my own training dataset I'm getting too accurate results. class gradientdescent: def fit(self,X,Y): …
-2
votes
1 answer

ZeroDivisionError: division by zero error in gradient descent.py

def computeCost(X,y,theta,lam): tobesummed = np.power(((X.dot(theta.T))-y),2)+lam*np.sum(np.power(theta,2)) return np.sum(tobesummed)/(2 * len(X)) def denormalise_price(price): global mean global stddev ret = price * stddev +…
-2
votes
1 answer

Fluctuations of gradient descent

I'm studying about Neural network and I have some questions about the theory of Gradient descent. why is the fluctuation(slope) of Batch gradient descent less than the fluctuation(slope) of SGD. Why SGD avoid local minima better Batch gradient…
-2
votes
1 answer

Is Gradient Descent always used during backpropagation for updating weights?

Gradient Descent, rmsprop, adam are optimizers. Assume I have taken adam or rmsprop optimizer while compiling model i.e model.compile(optimizer = "adam"). My doubt is that, now during backpropagation, is gradient Descent is used for updating weights…
-2
votes
2 answers

Linear regression using gradient descent; having trouble with cost function value

I'm coding linear regression by using gradient descent. By using for loop not tensor. I think my code is logically right, and when I plot the graph theta value and linear model seems to be coming out good. But the value of cost function is high. Can…
-2
votes
2 answers

I'm coding steps for gradient descent, to find the updated w and b, and I am having issues with my Python code

The formula I am using to find the updated w and b parameters: For w: new w = w_current - learning rate * partial derivative with respect to w For b: new b = b_current - learning rate * partial derivative with respect to b From the picture I am…
-2
votes
2 answers

Is there any specific data behavior that is responsible for overfitting and underfitting?

Since I'm new to data science, I just want to know that is there any specific data behavior that is responsible for overfitting and/or underfitting? Because if we are dealing with linear regression and we are supposed to get the Best fit line…
-2
votes
1 answer

Odd behavior of cost over time with SGD

I am relatively new to ML/DL and have been trying to improve my skills by making a model that learns the MNIST data set without TF or keras. I have 784 input nodes, 2 hidden layers of 16 neurons each, and 10 output nodes corresponding to which…