Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.


Tag usage:

Questions on should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.


Read more:

1428 questions
7
votes
2 answers

Behavioral difference between Gradient Desent and Hill Climbing

I'm trying to understand the difference between these two algorithms and how they differ in solving a problem. I have looked at the algorithms and the internals of them. It would be good to hear from others who already experienced with them.…
PRCube
  • 566
  • 2
  • 6
  • 19
7
votes
4 answers

Implementing gradient descent for multiple variables in Octave using "sum"

I'm doing Andrew Ng's course on Machine Learning and I'm trying to wrap my head around the vectorised implementation of gradient descent for multiple variables which is an optional exercise in the course. This is the algorithm in question (taken…
Nobilis
  • 7,310
  • 1
  • 33
  • 67
7
votes
1 answer

Explanation for Coordinate Descent and Subgradient

How to get an easy explanation of Coordinate descent and subgradient solution in context of lasso. An intuitive explanation followed by proof will be helpful.
7
votes
2 answers

Full-matrix approach to backpropagation in Artificial Neural Network

I am learning Artificial Neural Network (ANN) recently and have got a code working and running in Python for the same based on mini-batch training. I followed the book of Michael Nilson's Neural Networks and Deep Learning where there is step by step…
7
votes
1 answer

Clarification in the Theano tutorial

I am reading this tutorial provided on the home page of Theano documentation I am not sure about the code given under the gradient descent section. I have doubts about the for loop. If you initialize the 'param_update' variable to…
Abhishek
  • 3,337
  • 4
  • 32
  • 51
7
votes
2 answers

Gradient Descent: Do we iterate on ALL of the training set with each step in GD? or Do we change GD for each training set?

I've taught myself machine learning with some online resources but I have a question about gradient descent that I couldn't figure out. The formula for gradient descent is given by the following logistics regression: Repeat { θj =…
Terence Chow
  • 10,755
  • 24
  • 78
  • 141
7
votes
1 answer

Gradient Descent Optimization in CUDA

I will code my first relatively big CUDA project as Gradient Descent Optimization for machine learning purposes. I would like to get benefit from crowd wisdom about some useful native functions of the CUDA that might be short cut to use in the…
erogol
  • 13,156
  • 33
  • 101
  • 155
6
votes
1 answer

Get positive and negative part of gradient for loss function in PyTorch

I want to implement non-negative matrix factorization using PyTorch. Here is my initial implement: def nmf(X, k, lr, epochs): # X: input matrix of size (m, n) # k: number of latent factors # lr: learning rate # epochs: number of…
6
votes
1 answer

Why can't I get the result I got with the sklearn LogisticRegression with the coefficients_sgd method?

from math import exp import numpy as np from sklearn.linear_model import LogisticRegression I used code below from How To Implement Logistic Regression From Scratch in Python def predict(row, coefficients): yhat = coefficients[0] for i in…
user16386186
6
votes
1 answer

PyTorch `torch.no_grad` vs `torch.inference_mode`

PyTorch has new functionality torch.inference_mode as of v1.9 which is "analogous to torch.no_grad... Code run under this mode gets better performance by disabling view tracking and version counter bumps." If I am just evaluating my model at test…
6
votes
0 answers

PyTorch Autograd Differentiated Tensors appears to not have been used in the graph

I'm trying to improve a CNN I made by implementing a weighted loss method described in this paper. To do this, I looked into this notebook which implements the pseudo-code of the method described in the paper. When translating their code to my…
6
votes
1 answer

What is the default batch size of pytorch SGD?

What does pytorch SGD do if I feed the whole data and do not specify the batch size? I don't see any "stochastic" or "randomness" in the case. For example, in the following simple code, I feed the whole data (x,y) into a model. optimizer =…
6
votes
1 answer

TensorFlow: How can I inspect gradients and weights in eager execution?

I am using TensorFlow 1.12 in eager execution, and I want to inspect the values of my gradients and my weights at different points during training for debugging purposes. This answer uses TensorBoard to get nice graphs of weight and gradient…
user4028648
6
votes
3 answers

Tensorflow, Keras: How to create a trainable variable that only update in specific positions?

For example, y=Ax where A is an diagonal matrix, with its trainable weights (w1, w2, w3) on the diagonal. A = [w1 ... ... ... w2 ... ... ... w3] How to create such trainable A in Tensorflow or Keras? If I try A = tf.Variable(np.eye(3)),…
null
  • 1,167
  • 1
  • 12
  • 30
6
votes
2 answers

Escaping local minimum with tensorflow

I am solving this system of equations with tensorflow: f1 = y - x*x = 0 f2 = x - (y - 2)*(y - 2) + 1.1 = 0 If I choose bad starting point (x,y)=(-1.3,2), then I get into local minima optimising f1^2+f2^2 with this code: f1 = y - x*x f2 = x - (y -…