Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.


Tag usage:

Questions on should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.


Read more:

1428 questions
0
votes
1 answer

None Value while optimizing data with gradient descent

I'm trying to make a small neural network in tensorflow and I'm a bit new in this. I saw this in a tutorial (http://de.slideshare.net/tw_dsconf/tensorflow-tutorial) and everything is working fine till I try to optimize the weights (with gradient…
0
votes
1 answer

SGD convergence test using learning rates

Can anyone give an explanation for the convergence test presented int the 8th minute of this lecture by Hugo Larochelle ?
0
votes
1 answer

Logistic Regression with Gradient Descent on large data

I have a training set with about 300000 examples and about 50-60 features and also it's a multiclass with about 7 classes. I have my logistic regression function that finds out the convergence of the parameters using gradient descent. My gradient…
Sera_Vinicit
  • 149
  • 3
  • 14
0
votes
1 answer

linear regression by gradient desent in R

I am very new to machine learning and currently is trying to do a linear regression using R, my code is below: x <- runif(1000, -5, 5) y <- runif(1000, -2, 2) z <- x + y res <- lm(z ~ x + y) alpha <- 0.01 num_iters <- 10000 theta <- matrix(c(0,0,0),…
0
votes
1 answer

Tensorflow gradients are always zero

Tensorflow gradients are always zero with respect to conv layers that are after first conv layer. I've tried different ways to check that but gradients are always zero! Here is the small reproducible code that can be run to check that. from…
0
votes
1 answer

Implement bias neurons neural network

I implemented bias units for my neural network with gradient descent. But I'm not 100% sure If I've implemented it the right way. Would be glade if you can quickly look through my code. Only the parts with if bias: are important. And my second…
0
votes
1 answer

Non-Symbolic loss in Keras/TensorFlow

For a university project, I want to train a (simulated) robot to hit a ball given the position and velocity. The first thing to try is policy gradients: I have a parametric trajectory generator. For every training position, I feed the position…
jcklie
  • 4,054
  • 3
  • 24
  • 42
0
votes
0 answers

Why is the gradient of tf.sign() not equal to 0?

I expected the gradient for tf.sign() in TensorFlow to be equal to 0 or None. However, when I examined the gradients, I found that they were equal to very small numbers (e.g. 1.86264515e-09). Why is that? (If you are curious as to why I even want to…
random_stuff
  • 197
  • 2
  • 12
0
votes
1 answer

Stochastic Gradient Descent design matrix too big for R

I'm trying to implement a baseline prediction model of movie ratings (akin to the various baseline models from the NetFlix prize), with parameters learned via stochastic gradient descent. However, because both explanatory variables are categorical…
Caio Kenup
  • 43
  • 3
0
votes
1 answer

octave:steepest descent : how to minimize an equation

I am new with Octave.Now I am trying to implement steepest descent algorithm in Octave. For example minimization of f(x1,x2) = x1^3 + x2^3 - 2*x1*x2 Estimate starting design point x0, iteration counter k0, convergence parameter tolerence = 0.1.…
voxter
  • 853
  • 2
  • 14
  • 30
0
votes
1 answer

Better alternative to gradient descent

Is there any method that is faster and more efficient than gradient descent for updating weights in a neural network. Can we use multiplicative weight update in place of gradient-descent. Is it better
user6460588
  • 144
  • 1
  • 10
0
votes
1 answer

MNIST Tensorflow vs code from Michael Nielsen

I read Michael Nielsen's book neuralnetworksanddeeplearning.com about Neural networks. He always does the example with the MNIST data. I now took his code and designed exactly the same network in Tensorflow, but I realized that the results in…
jojo123456
  • 341
  • 1
  • 3
  • 11
0
votes
1 answer

Gradient descent values not correct

I'm attempting to implement gradient descent using code from : Gradient Descent implementation in octave I've amended code to following : X = [1; 1; 1;] y = [1; 0; 1;] m = length(y); X = [ones(m, 1), data(:,1)]; theta = zeros(2, 1); …
thepen
  • 371
  • 1
  • 11
0
votes
0 answers

Gradient descent not working without normalization, why?

My question is based on the data from Coursera course - https://www.coursera.org/learn/machine-learning/, but after a search is appears to be a common problem. The gradient descent works perfectly on normalize data (pic.1), but goes in wrong…
0
votes
1 answer

Implementing gradient descent with Scala and Breeze - error : could not find implicit value for parameter op:

I'm attempting to apply a gradient descent implementation in Scala and breeze based on Octave from : Gradient Descent implementation in octave The octave code I'm attempting to re-write is : theta = theta -((1/m) * ((X * theta) - y)' * X)' *…
thepen
  • 371
  • 1
  • 11