Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.


Tag usage:

Questions on should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.


Read more:

1428 questions
4
votes
1 answer

Projection onto unit simplex using gradient decent in Pytorch

In Professor Boyd homework solution for projection onto the unit simplex, he winds up with the following equation: g_of_nu = (1/2)*torch.norm(-relu(-(x-nu)))**2 + nu*(torch.sum(x) -1) - x.size()[0]*nu**2 If one calculates nu*, then the projection…
Saeed
  • 598
  • 10
  • 19
4
votes
1 answer

In PyTorch, how do I update a neural network via the average gradient from a list of losses?

I have a toy reinforcement learning project based on the REINFORCE algorithm (here's PyTorch's implementation) that I would like to add batch updates to. In RL, the "target" can only be created after a "prediction" has been made, so standard…
Josh
  • 167
  • 1
  • 13
4
votes
1 answer

Is SGD optimizer in PyTorch actually does Gradient Descent algorithm?

I'm working on trying to compare the converge rate of SGD and GD algorithms for the neural networks. In PyTorch, we often use SGD optimizer as follows. train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=64,…
mathgeek
  • 43
  • 5
4
votes
1 answer

Understand Adam optimizer intuitively

According the pseudo code of Adam: I wrote some code: from matplotlib import pyplot as plt import numpy as np # np.random.seed(42) num = 100 x = np.arange(num).tolist() # The following 3 sets of g_list stand for 3 types of gradient changes: #…
4
votes
1 answer

Manually update momentum terms in pytorch optimizers

The Adam optimizer has several terms that are used to add "momentum" to the gradient descent algorithm, making the step size for each variable adaptive: Specifically, in the case of Adam here, I refer to the m-hat and v-hat terms. There are times,…
user650261
  • 2,115
  • 5
  • 24
  • 47
4
votes
2 answers

Vectorized Regularized Gradient Descent not passing numerical check

I've written an implementation in Python using NumPy of vectorized regularized Gradient descent for logistic regression. I've used a numerical check method to check that my implementation is correct. The numerical check verifies my implementation of…
4
votes
1 answer

Numerical jump in sklearn GradientBoostingRegressor

I have been investigating a "hand-rolled" version of a gradient boosted regression tree. I find that the errors agree very well with the sklearn GradientBoostingRegressor module until I increase the tree building loop above a certain value. I am…
AJR
  • 105
  • 6
4
votes
1 answer

How are neural networks, loss and optimizer connected in PyTorch?

I've seen answers to this question, but I still don't understand it at all. As far as I know, this is the most basic setup: net = CustomClassInheritingFromModuleWithDefinedInitAndForward() criterion = nn.SomeLossClass() optimizer =…
4
votes
1 answer

Use of scheduler with self-ajusting optimizers in PyTorch

In PyTorch, the weight adjustment policy is determined by the optimizer, and the learning rate is adjusted with a scheduler. When the optimizer is SGD, there is only one learning rate and this is straightforward. When using Adagrad, Adam, or any…
Syncrossus
  • 570
  • 3
  • 17
4
votes
3 answers

Why SGDClassifier with hinge loss is faster than SVC implementation in scikit-learn

As we know For the support vector machine we can use SVC as well as SGDClassifier with hinge loss implementation. Is SGDClassifier with hinge loss implementation is faster than SVC. Why? Links of both implementations of SVC in scikit-learn: SVC…
Asis
  • 183
  • 3
  • 15
4
votes
1 answer

What is the difference between step size and learning rate in machine learning?

I am using TensorFlow to implement some basic ML code. I was wondering if anyone could give me a short explanation of the meaning of and difference between step size and learning rate in the following functions. I used…
4
votes
1 answer

simultaneously update theta0 and theta1 to calculate gradient descent in python

I am taking the machine learning course from coursera. There is a topic called gradient descent to optimize the cost function. It says to simultaneously update theta0 and theta1 such that it will minimize the cost function and will reach to global…
Serenity
  • 3,884
  • 6
  • 44
  • 87
4
votes
0 answers

How can I implement this L1 norm Robust PCA equation in a more efficient way?

I recently learned in class the Principle Component Analysis method aims to approximate a matrix X to a multiplication of two matrices Z*W. If X is a n x d matrix, Z is a n x k matrix and W is a k x d matrix. In that case, the objective function the…
Peter
  • 460
  • 6
  • 23
4
votes
0 answers

Machine Learning: Stochastic gradient descent for logistic regression in R: Calculating Eout and average number of epochs

I am trying to write a code to solve the following problem (As stated in HW5 in the CalTech course Learning from Data): In this problem you will create your own target function f (probability in this case) and data set D to see how Logistic …
4
votes
1 answer

OpenAI Gradient Checkpointing with Tensorflow Eager Execution

I have recently switched to Tensorflow Eager (currently working with TF 1.8.0) and like it a lot. However, I now have quite a large model which does not fit into my GPU Memory (GTX 1080Ti, 12GB VRAM) when run with the Gradient Tape which is needed…