Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.


Tag usage:

Questions on should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.


Read more:

1428 questions
-1
votes
1 answer

How does error get back propagated through pooling layers?

I asked a question earlier that might have been too specific so I'll ask again in more general terms. How does error get propagated backwards through a pooling layer when there are no weights to train? In the tensorflow video at 6:36…
-1
votes
2 answers

How to use Gradient Descent to solve this multiple terms trigonometry function?

Question is like this: f(x) = A sin(2π * L * x) + B cos(2π * M * x) + C sin(2π * N * x) and L,M,N are constants integer, 0 <= L,M,N <= 100 and A,B,C can be any possible integers. Here is the given data: x =…
-1
votes
1 answer

initial value error while using gradient descent algorithm

Problem: Initial value is 10000 and the solution is converging to 10000 instead of actual solution 1. import numpy.linalg as nl x_ini=10000 def obj(x): f = x**2 - 2*x + 3 return f def grad(x): df = 2*x - 2 return…
-1
votes
1 answer

Linear Regression and gradient descent

In Linear Regression, we have formulas to calculate the slope and intercept, to find the best fit line; then why do we need to use Gradient Descent for calculating the optimum slope & intercept, which we already get by given formulas?
-1
votes
2 answers

Loss of logistic regression model not decreasing through gradient descent

I'm a beginner to machine learning and have been trying to implement gradient descent to try and optimize the weights of my model. I am trying to develop the model from scratch and I have reviewed a lot of code online but my implementation still…
-1
votes
1 answer

How to calculate Gradient in matlab?

I am working on pedestrian step detection (acceleration). I want to calculate statistical features from my filtered signal. I have already calculated some and now I want to calculate gradient. My data is of 1x37205 double. I calculated features…
-1
votes
1 answer

Derivate using chain rule doesn't work in MATLAB

I am trying to derive the gradient and hessian for a given function. When i directly do the gradient it works well but when I apply chain rule it doesn't works and throws me an error as below Error using sym/diff (line 70) Second argument must be a…
mkpisk
  • 152
  • 1
  • 9
-1
votes
2 answers

DeprecationWarning: 'shape' argument should be used instead of 'dims'

In my practice on Gradient Descent, to plot the MSE in a 3d Graph, following code is use : ij_min = np.unravel_index(indices=plot_cost.argmin(), dims=plot_cost.shape) ij_min are the theta0 and theta1 values in the linear regression while plot_cost…
Sachin
  • 85
  • 9
-1
votes
1 answer

When implementing mini batch gradient descent is it better to chose the training exemples randomly?

When implementing mini batch gradient descent is it better to chose the training exemples-to compute the derivatives- randomly? Or would it be better to shuffle the whole training exemples then iterate trough them and shuffle everytime? The first…
-1
votes
4 answers

SGD Classifier with Logloss and L2 regularization Using SGD without using sklearn python

I'm working on an assignment problem on SGD manual implementation using python. I'm stuck at the dw derivative function. import numpy as np import pandas as pd from sklearn.datasets import make_classification X, y =…
-1
votes
1 answer

In backpropogation, what does it mean when the error of a neural network converges to 0.5?

I've been trying to learn the math behind neural networks and have implemented (in Octave) a version of the following equations which include bias terms. Back-propagation equations matrix form: Visual representation of the problem and…
-1
votes
1 answer

How does the derivative of cost function gives direction of fastest decrease in cost?

I am learning Gadient descent to find the minimum of a function. There I found a line of code as shown m1' = m1 - alpha* d/dm1 j(m0,m1) # m0,m1 are weights, j(m0,m1) is the loss function It is stated that the partial derivative of the cost function…
-1
votes
1 answer

Gradient Descent - Logistical Regression - Weird thetas

Running my gradient descent function against the training data produces thetas of [0.3157; 0.0176; 0.0148]. The first value is significantly higher then the others. When it comes to predicting the probability of my test data, ends up being 0.42 +-…
-1
votes
1 answer

Interpret GAN loss

I am currently training the standard DCGAN network on my dataset. After 40 epochs, the loss of both generator and discriminator is 45-50. Can someone please explain the reason and possible solution for this?
-1
votes
1 answer

Gradient descent algorithm, should I normalize paramethers too?

I have a few doubts about normalization and gradient descent that I couldn't figure out: Should I normalize the paramethers apart from the samples? If I normalize the paramethers before executing the gradient descent, should I desnormalize the…