Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.


Tag usage:

Questions on should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.


Read more:

1428 questions
10
votes
2 answers

Tensorflow: How to write op with gradient in python?

I would like to write a TensorFlow op in python, but I would like it to be differentiable (to be able to compute a gradient). This question asks how to write an op in python, and the answer suggests using py_func (which has no gradient): Tensorflow:…
Alex I
  • 19,689
  • 9
  • 86
  • 158
10
votes
1 answer

Stochastic gradient descent from gradient descent implementation in R

I have a working implementation of multivariable linear regression using gradient descent in R. I'd like to see if I can use what I have to run a stochastic gradient descent. I'm not sure if this is really inefficient or not. For example, for each…
Daina
  • 421
  • 2
  • 6
  • 13
10
votes
1 answer

Machine learning - Linear regression using batch gradient descent

I am trying to implement batch gradient descent on a data set with a single feature and multiple training examples (m). When I try using the normal equation, I get the right answer but the wrong one with this code below which performs batch gradient…
9
votes
3 answers

Is my implementation of stochastic gradient descent correct?

I am trying to develop stochastic gradient descent, but I don't know if it is 100% correct. The cost generated by my stochastic gradient descent algorithm is sometimes very far from the one generated by FMINUC or Batch gradient descent. while batch…
9
votes
4 answers

Fast gradient-descent implementation in a C++ library?

I'm looking to run a gradient descent optimization to minimize the cost of an instantiation of variables. My program is very computationally expensive, so I'm looking for a popular library with a fast implementation of GD. What is the recommended…
Jim
  • 4,509
  • 16
  • 50
  • 80
8
votes
2 answers

Accumulating Gradients

I want to accumulate the gradients before I do a backward pass. So wondering what the right way of doing it is. According to this article it's: model.zero_grad() # Reset gradients tensors for i, (inputs, labels) in…
sachinruk
  • 9,571
  • 12
  • 55
  • 86
8
votes
2 answers

Gradient descent impementation python - contour lines

As a self study exercise I am trying to implement gradient descent on a linear regression problem from scratch and plot the resulting iterations on a contour plot. My gradient descent implementation gives the correct result (tested with Sklearn)…
8
votes
1 answer

TensorFlow average gradients over several batches

This is a possible duplicate of Tensorflow: How to get gradients per instance in a batch?. I ask it anyway, because there has not been a satisfying answer and the goal here is a bit different. I have a very big network that I can fit on my GPU but…
niko
  • 1,128
  • 1
  • 11
  • 25
8
votes
1 answer

Gradient of a Loss Function for an SVM

I'm working on this class on convolutional neural networks. I've been trying to implement the gradient of a loss function for an svm and (I have a copy of the solution) I'm having trouble understanding why the solution is correct. On this page it…
David
  • 1,398
  • 1
  • 14
  • 20
7
votes
1 answer

Loss with custom backward function in PyTorch - exploding loss in simple MSE example

Before working on something more complex, where I knew I would have to implement my own backward pass, I wanted to try something nice and simple. So, I tried to do linear regression with mean squared error loss using PyTorch. This went wrong (see…
7
votes
2 answers

Why torch.sum() before doing .backward()?

I can see what this code below from this video is trying to do. But the sum from y=torch.sum(x**2) confuses me. With sum operation, y becomes a tensor with one single value. As I understand .backward() as calculating derivatives, why would we want…
7
votes
1 answer

Correct backpropagation in simple perceptron

Given the simple OR gate problem: or_input = np.array([[0,0], [0,1], [1,0], [1,1]]) or_output = np.array([[0,1,1,1]]).T If we train a simple single-layered perceptron (without backpropagation), we could do something like this: import numpy as…
alvas
  • 115,346
  • 109
  • 446
  • 738
7
votes
1 answer

Backpropagation with Momentum

I'm following this tutorial for implementing the Backpropagation algorithm. However, I am stuck at implementing momentum for this algorithm. Without Momentum, this is the code for weight update method: def update_weights(network, row, l_rate): …
7
votes
1 answer

Where is the code for gradient descent?

Running some experiments with TensorFlow, want to look at the implementation of some functions just to see exactly how some things are done, started with the simple case of tf.train.GradientDescentOptimizer. Downloaded the zip of the full source…
rwallace
  • 31,405
  • 40
  • 123
  • 242
7
votes
3 answers

Is Stochastic gradient descent a classifier or an optimizer?

I am new to Machine Learning and I am trying analyze the classification algorithm for a project of mine. I came across SGDClassifier in sklearn library. But a lot of papers have referred to SGD as an optimization technique. Can someone please…