Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.


Tag usage:

Questions on should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.


Read more:

1428 questions
4
votes
1 answer

Is there a Python library where I can import a gradient descent function/method?

One way to do gradient descent in Python is to code it myself. However, given how popular a concept it is in machine learning, I was wondering if there is a Python library that I can import that gives me a gradient descent method (preferably…
4
votes
1 answer

Gradient Descent for Linear Regression Exploding

I am trying to implement gradient descent for linear regression using this resource: https://spin.atomicobject.com/2014/06/24/gradient-descent-linear-regression/ My problem is that my weights are exploding (increasing exponentially) and essentially…
4
votes
3 answers

Pytorch: Gradient of output w.r.t parameters

I'm interested in Finding the Gradient of Neural Network output with respect to the parameters (weights and biases). More specifically, assume I have the following Neural Network Structure [6,4,3,1]. The input samples size is 20. What I'm interested…
4
votes
1 answer

Implementing back propagation using numpy and python for cleveland dataset

I wanted to predict heart disease using backpropagation algorithm for neural networks. For this I used UCI heart disease data set linked here: processed cleveland. To do this, I used the cde found on the following blog: Build a flexible Neural…
Tarun Khare
  • 1,447
  • 6
  • 25
  • 43
4
votes
1 answer

Differentiate gradients

Is there a way to differentiate gradients in PyTorch? For example, I can do this in TensorFlow: from pylab import * import tensorflow as tf tf.reset_default_graph() sess = tf.InteractiveSession() def gradient_descent( loss_fnc, w, max_its, lr): …
firdaus
  • 541
  • 1
  • 6
  • 13
4
votes
0 answers

How to implement custom gradient of a tensor using eager execution in TensorFlow

In this(https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/eager/python/g3doc/guide.md) tutorial, a method of implementing custom gradient function is provided. @tfe.custom_gradient def log1pexp(x): e = tf.exp(x) def…
Sirui Lu
  • 41
  • 2
4
votes
0 answers

how to use iter_size in caffe

I dont know the exact meaning of 'iter_size' in caffe solver though I googled a lot. it always says that 'iter_size' is a way to effectively increase the batch size without requiring the extra GPU memory. Could I understand it as this: If set…
4
votes
2 answers

Simple gradient descent using mxnet

I'm trying to use MXNet's gradient descent optimizers to minimize a function. The equivalent example in Tensorflow would be: import tensorflow as tf x = tf.Variable(2, name='x', dtype=tf.float32) log_x = tf.log(x) log_x_squared =…
user3363678
  • 178
  • 6
4
votes
1 answer

Eligibility traces in TensorFlow

According to Sutton's book - Reinforcement Learning: An Introduction, the update equation of Network weights is given by: where et is the eligibility trace. This is similar to a Gradient Descent update with an extra et. Can this eligibility trace…
nikpod
  • 1,238
  • 14
  • 22
4
votes
1 answer

how does xgboost enforce monotonicity constraints

I would like to know that how xgboost enforce monotonic constraints while building the tree model. So far by reading the code, I have understood that it has something to do with weights of each node but am not able to understand why this approach…
4
votes
1 answer

Tensorflow Optimizers - multiple loss values passed to minimize()?

My first time using Tensorflow on the MNIST dataset, I had a really simple bug where I forgot to take mean of my error values before passing it to the optimizer. In other words, instead of loss =…
ejlu
  • 41
  • 4
4
votes
1 answer

How does changing batch size results in different prediction time?

I trained a data set(~8000 images) using Caffe and a batch size of 5 with Alex net network. This results in a prediction time of (800-900)ms. Then i changed the batch size to 56(maximum my machine can support) and the prediction time reduced to…
4
votes
2 answers

Gradient Descent: thetas not converging

I'm trying to figure out gradient descent with Octave. With each iteration, my thetas get exponentially larger. I'm not sure what the issue is as I'm copying another function directly. Here are my matrices: X = 1 98 1 94 1 93 1 88 1…
4
votes
3 answers

Why deep NN can't approximate simple ln(x) function?

I have created ANN with two RELU hidden layers + linear activation layer and trying to approximate simple ln(x) function. And I am can't do this good. I am confused because lx(x) in x:[0.0-1.0] range should be approximated without problems (I am…
4
votes
1 answer

Gradient Descent algorithm not converging in Haskell

I am trying to implement the gradient descent algorithm in Andrew Ng's ML course. After reading in the data, I try to implement the following, updating my list of theta values 1000 times, with the expectation of some convergence. The algorithm in…