Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.


Tag usage:

Questions on should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.


Read more:

1428 questions
0
votes
1 answer

FailedPreconditionError while trying to use RMSPropOptimizer on tensorflow

I am trying to use the RMSPropOptimizer for minimizing loss. Here's the part of the code that is relevant: import tensorflow as tf #build large convnet... #... opt = tf.train.RMSPropOptimizer(learning_rate=0.0025, decay=0.95) #do stuff to get…
aphdstudent
  • 545
  • 1
  • 5
  • 13
0
votes
2 answers

Gradient Descent algorithm taking long time to complete - Efficiency - Python

I am trying to implement the gradient descent algorithm using python and following is my code, def grad_des(xvalues, yvalues, R=0.01, epsilon = 0.0001, MaxIterations=1000): xvalues= np.array(xvalues) yvalues = np.array(yvalues) length =…
haimen
  • 1,985
  • 7
  • 30
  • 53
0
votes
1 answer

Trying to understand code that computes the gradient wrt to the input for LogSoftMax in Torch

Code comes from: https://github.com/torch/nn/blob/master/lib/THNN/generic/LogSoftMax.c I don't see how this code is computing the gradient w.r.t to the input for the module LogSoftMax. What I'm confused about is what the two for loops are doing. for…
lars
  • 1,976
  • 5
  • 33
  • 47
0
votes
1 answer

Machine Learning - SVM - How to calculate bias when calculate vector W?

I am writing code of SVM Primal that uses SGD (Stochastic SubGradient Descent) for optimize the vector W. The classification methos is sign(w*x + bias). My question is how to find the best bias for it? I guess that it has to do during the W…
zardav
  • 1,160
  • 3
  • 12
  • 22
0
votes
1 answer

Can FTRL be applied on linear least squares? or is it just for logistic regression models?

I'm exploring follow-the-regularized-leader FTRL proximal gradient descent: paper, reference implementation. Everywhere FTRL is mentioned, the loss surface for the gradient decent is the LogLoss, and the model for prediction is Logistic regression.…
ihadanny
  • 4,377
  • 7
  • 45
  • 76
0
votes
1 answer

Compute updates in Theano after N number of loss calculations

I've constructed a LSTM recurrent NNet using lasagne that is loosely based on the architecture in this blog post. My input is a text file that has around 1,000,000 sentences and a vocabulary of 2,000 word tokens. Normally, when I construct…
o-90
  • 17,045
  • 10
  • 39
  • 63
0
votes
0 answers

Is it possible to implement gradient checking in a vectorized way when implementing neural network?

For example, I add delta to all dimensions of w from y=w.dot(x)+b and calculate dw in one time?
0
votes
1 answer

theano GRU rnn adam optimizer

Technical information: OS: Mac OS X 10.9.5 IDE: Eclipse Mars.1 Release (4.5.1), with PyDev and Anaconda interpreter (grammar version 3.4) GPU: NVIDIA GeForce GT 650M Libs: numpy, aeosa, Sphinx-1.3.1, Theano 0.7, nltk-3.1 My background: I am very new…
0
votes
1 answer

Represent Linear Regression features in Gradient Descent numerically

The following piece of python code works well for finding gradient descent: def gradientDescent(x, y, theta, alpha, m, numIterations): xTrans = x.transpose() for i in range(0, numIterations): hypothesis = np.dot(x, theta) …
Saurabh Verma
  • 6,328
  • 12
  • 52
  • 84
0
votes
1 answer

Treating missing values as really missing in Vowpal Wabbit

Is there a way to correctly represent missing values in VW input format -- not to impute with the mean or median, not to set them to 0 or any other constant, but to treat them as really missing, so that SGD and FTRL-Proximal algorithms could exclude…
kurtosis
  • 1,365
  • 2
  • 12
  • 27
0
votes
1 answer

Gradient Descent with multiple variable without Matrix

I'm new with Matlab and Machine Learning and I tried to make a gradient descent function without using matrix. m is the number of example on my training set n is the number of feature for each example The function gradientDescentMulti takes 5…
Arthur
  • 4,870
  • 3
  • 32
  • 57
0
votes
0 answers

Gradient Descent With Smoothness constraints

I have a noisy image Y and known kernel H. I need to estimate a denoised image X such that it gradient of X is also minimised. J= ||Y-HX||^2+ Alpha* Smoothness constraint(X); Smoothness constraint= L1norm(|| Grad(X) ||) how do i estimate the…
0
votes
0 answers

Adjusting proto file for Caffe

I'm trying to modify caffe.proto in order to add 2 new fields to SolverParameter. the two lines I add, at the very end of the SolverParameter message are: optional int32 start_lr_policy = 36; // Iteration to start CLR policy described in…
user1245262
  • 6,968
  • 8
  • 50
  • 77
0
votes
1 answer

SGD with L2 regularization in mllib

I am having difficulty reading open source mllib code for SGD with L2 regularization. The code is class SquaredL2Updater extends Updater { override def compute( weightsOld: Vector, gradient: Vector, stepSize: Double, iter: Int, regParam:…
bhomass
  • 3,414
  • 8
  • 45
  • 75
0
votes
1 answer

how did mllib calculate gradient

Need an mllib expert to help explain the linear regression code. In LeastSquaresGradient.compute override def compute( data: Vector, label: Double, weights: Vector, cumGradient: Vector): Double = { val diff = dot(data, weights) -…