Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.


Tag usage:

Questions on should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.


Read more:

1428 questions
0
votes
1 answer

gradient descent as applied to feature vector bag of words classification task

I've watched the Andrew Ng videos over and over and still I don't understand how to apply gradient descent to my problem. He deals pretty much exclusively in the realm of high level conceptual explanations but what I need are ground level tactical…
smatthewenglish
  • 2,831
  • 4
  • 36
  • 72
0
votes
0 answers

ADADELTA preserving randomly initialized weights in neural network

I am attempting to train a 2 hidden layer tanh neural neural network on the MNIST data set using the ADADELTA algorithm. Here are the parameters of my setup: Tanh activation function 2 Hidden layers with 784 units (same as the number of input…
Jeremy Salwen
  • 8,061
  • 5
  • 50
  • 73
0
votes
0 answers

How to improve gradient descent backpropogation speed in MATLAB Neural Network Toolbox?

I am currently training several hundred different permutations of neural networks. Using Levenberg-Marquardt backpropogation yields results relatively fast, however I prefer if I use gradient descent for now for academic reasons. Unfortunately,…
mesllo
  • 545
  • 7
  • 29
0
votes
1 answer

How can I add concurrency to neural network processing?

The basics of neural networks, as I understand them, is there are several inputs, weights and outputs. There can be hidden layers that add to the complexity of the whole thing. If I have 100 inputs, 5 hidden layers and one output (yes or no),…
Shamoon
  • 41,293
  • 91
  • 306
  • 570
0
votes
0 answers

fmin_cg not minimizing enough

while doing, just a simple implementation of grad descent (predicting a st line, with sample points as input), i pretty accurately predicted the line with iterative method, but using fmin_cg(), the accuracy went down, the first thought was to…
0
votes
1 answer

Gradient decent on the inputs of a pre-trained neural network to achieve a target y-value

I have a trained neural network which suitably maps my inputs to my outputs. Is it then possible to specify a desired y output and then use a gradient decent method to determine the optimum input values to get that output? When using…
0
votes
1 answer

Bi-Threaded processing in Matlab

I have a Large-Scale Gradient Descent optimization problem that I am running using Matlab. The code has got two parts: A Sequential update part that fires every iteration that updates the parameter vector. A validation error computation part that…
0
votes
0 answers

Logistic regression with gradient descent resulting different outcome for different dataset

I am trying logistic regression using gradient descent with two data set, I get different result for each of them. Dataset1 Input X= 1 2 3 1 4 6 1 7 3 1 5 5 1 5 4 1 6 4 1 3 4 1 4 5 1 1 2 1 3…
Sam
  • 2,545
  • 8
  • 38
  • 59
0
votes
1 answer

Is the mini-batch gradient just the sum of online gradients?

I am adapting code for training a neural network that does online training to work for mini-batches. Is the mini-batch gradient for a weight (de/dw) just the sum of the gradients for the samples in the mini-batch? Or, is it some non-linear…
0
votes
1 answer

Gradient descent not working as expected

I am using Stochastic Gradient Descent from scikit-learn http://scikit-learn.org/stable/modules/sgd.html. The example given in the link works like this: >>> from sklearn.linear_model import SGDClassifier >>> X = [[0., 0.], [1., 1.]] >>> y = [0,…
0
votes
1 answer

Vectorized gradient descent basics

I'm implementing simple gradient descent in octave but its not working. Here is the data I'm using: X = [1 2 3 1 4 5 1 6 7] y = [10 11 12] theta = [0 0 0] alpha = 0.001 and itr = 50 This is my gradient…
0
votes
1 answer

During Stochastic Gradient Descent, what's the differences between these two updating hypothese ways?

I have a question about updating the theta during the Stochastic GD. I have two ways to update theta: 1) Use the previous theta, to get all the hypotheses for all samples, and then update the theta by each sample. Like: hypothese = np.dot(X,…
0
votes
1 answer

missing value where TRUE/FALSE needed in R

When I run the following code without commenting gr.ascent(MMSE, 0.5, verbose=TRUE) I receive this error Error in b1 * x : 'b1' is missing but when I comment that line I receive the following error when testing MMSE with these arguments…
Mona Jalal
  • 34,860
  • 64
  • 239
  • 408
0
votes
1 answer

L-BFGS from RISO not working

I am testing the implementation of RISO's L-BFGS library for function minimization for logistic regression in Java. Here is the link to the class that I am using. To test the library, I am trying to minimize the function: f(x) = 2*(x1^2) + 4*x2 +…
0
votes
3 answers

Incorrect Results from Gradient Descent in Matlab

I'm taking the course in Matlab, and I have done a gradient descent implementation but it gives incorrect results. The code: for iter = 1:num_iters sumTheta1 = 0; sumTheta2 = 0; for s = 1:m sumTheta1 = theta(1) + theta(2) .* X(s,2) - y(s); …
Pedro.Alonso
  • 1,007
  • 3
  • 20
  • 41