Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.


Tag usage:

Questions on should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.


Read more:

1428 questions
0
votes
1 answer

Backpropagation neural network, too many neurons in layer causing output to be too high

Having neural network with alot of inputs causes my network problems like Neural network gets stuck and feed forward calculation always gives output as 1.0 because of the output sum being too big and while doing backpropagation, sum of gradients…
0
votes
2 answers

Reasons not to use tf.train.AdamOptmizer?

I've read this article and it seems like, given enough memory, you should always use Adam over the other possible optimization algorithms (adadelta, rmsprop, vanilla sgd, etc). Are there any examples, either toy or real world, in which Adam will do…
George
  • 1,843
  • 2
  • 13
  • 24
0
votes
1 answer

Translating Logistic Regression loss function to Softmax

I currently have a program which takes a feature vector and classification, and applies it to a known weight vector to generate a loss gradient using Logistic Regression. This is that code: double[] grad = new double[featureSize]; //dot…
user2785277
  • 345
  • 4
  • 19
0
votes
1 answer

Instead LBFGS, using gradient descent in sparse autoencoder

In Andrew Ng's lecture notes, they use LBFGS and get some hidden features. Can I use gradient descent instead and produce the same hidden features? All the other parameters are the same, just change the optimization algorithm. Because When I use…
0
votes
0 answers

Gradient Boosting Classifier-n_estimators

I am trying Gradient Boosting Classifier for my project. I am using 100 samples. I have used Leave one out cross validation. As far as i know, GBC should give good results with large n_estimators. But i am getting low results with large…
0
votes
1 answer

Gradient descent not updating theta values

Using the vectorized version of gradient as described at : gradient descent seems to fail theta = theta - (alpha/m * (X * theta-y)' * X)'; The theta values are not being updated, so whatever initial theta value this is the values that is set…
blue-sky
  • 51,962
  • 152
  • 427
  • 752
0
votes
2 answers

newff and train functions of python's neurolab is giving inconsistent results for same code and input

While the input is the same and the code is the same, I get two different results when run multiple time. There are only two unique outputs though. I do not know what part of the code is randomized and I'm having a hard time figuring out where the…
0
votes
1 answer

Maximum Likelihood Estimation of a log function with sevaral parameters

I am trying to find out the parameters for the function below: $$ \log L(\alpha,\beta,v) = v/\beta(e^{-\beta T} -1) + \alpha/\beta \sum_{i=1}^{n}(e^{-\beta(T-t_i)} -1) + \sum_{i=1}^{N}log(v e^{-\beta t_i} + \alpha \sum_{j=1}^{jmax(t_i)}…
0
votes
1 answer

Active Contours (Snakes) Gradient Decent

I am doing research on the active contour (snake) using gradient decent which was implemented by Kass. The two pieces of documentation that I have been reading can be found here: Original paper and A more descriptive version My question is in…
0
votes
1 answer

"Function with duplicate name cannot be defined" error but no duplicate function

While trying to write a function for gradient descent in Matlab I got the following error: Function with duplicate name "gradientDescent" cannot be defined. The program I'm working on has two functions in it, and when I remove the second one the…
Paco Poler
  • 201
  • 1
  • 3
  • 12
0
votes
2 answers

Plot vectors of gradient descent in R

I've code gradient descent algorithm in R and now I'm trying to "draw" the path of the vectors. I've got draw points in my contour plot, but it's not correct because nobody knows what happened first. In my algorith always I have an previous state…
Carlos
  • 889
  • 3
  • 12
  • 34
0
votes
1 answer

Neural Network bad convergeance

I read a lot about NN last two weeks, I think i saw pretty much every "XOR" approach tutorials on net. But, i wasn't able to make work my own one. I started by a simple "OR" neuron approach. Giving good results. I think my problem is in…
0
votes
1 answer

Mutable Vector field is not updating in F#

let gradientDescent (X : Matrix) (y :Vector) (theta : Vector) alpha (num_iters : int) = let J_history = Vector.Build.Dense(num_iters) let m = y.Count |> double theta.At(0, 0.0) let x = …
Luke Xu
  • 2,302
  • 3
  • 19
  • 43
0
votes
0 answers

Why no automatic termination for stochastic gradient descent in the frameworks?

I checked out some notable open-source frameworks with SGD implementations - scikit-learn, vowpal-wabbit and tensor-flow. All of them leave the task of deciding how many iterations to the user! scikit requires the user to specify it explicitly,…
ihadanny
  • 4,377
  • 7
  • 45
  • 76
0
votes
1 answer

Non-vectorized Gradient Descent

I have a bug in the following code, which is returning inf, inf for the Thetas. def gradient_descent(x, y, t0, t1, alpha, num_iters): for i in range(num_iters): t0_sum = 0 t1_sum = 0 for i in range(m_num): # I have a feeling that the following…