Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.


Tag usage:

Questions on should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.


Read more:

1428 questions
4
votes
0 answers

Proper asynchronous stochastic gradient descent with celery

I have to use celery to parallelize a stochastic gradient descent algorithm, though that might not be the better choice to do it with celery, this is still my question =) The algorithm looks like that, where datas is the matrix of samples: #Random…
4
votes
1 answer

Gradient descent stochastic update - Stopping criterion and update rule - Machine Learning

My dataset has m features and n data points. Let w be a vector (to be estimated). I'm trying to implement gradient descent with stochastic update method. My minimizing function is least mean square. The update algorithm is shown below: for i = 1 ...…
3
votes
1 answer

Does the choice of an activation function and initial weights have any bearing on whether a Neural Network gets stuck in a local minima?

I posted this question yesterday asking if my Neural Network (that I'm training via backpropagation using stochastic gradient descent) was getting stuck in a local minima. The following papers talk about the problem of the local minima in an XOR…
3
votes
1 answer

Optimizing with non-negative constraints

Consider the following functions import numpy as np import scipy.optimize as opt import math # Periodic indexation def pl(list, i): return list[i % len(list)] # Main function (index j) def RT(list, j, L): return…
3
votes
1 answer

Autodiff implementation for gradient calculation

I have worked through some papers about the autodiff algorithm to implement it for myself (for learning purposes). I compared my algorithm in test cases to the output of tensorflow and their outputs did not match in most cases. Therefor i worked…
3
votes
1 answer

Logistic regression from scratch: error keeps increasing

I have implemented logistic regression from scratch, however when I run the script the algorithm always predict the wrong label. I've tried changing the training output and test_output by switching all 1 to 0 and vice versa but it always predict the…
3
votes
1 answer

Understanding gradient computation using backward() in PyTorch

I'm trying to understand the basic pytorch autograd system: x = torch.tensor(10., requires_grad=True) print('tensor:',x) x.backward() print('gradient:',x.grad) output: tensor: tensor(10., requires_grad=True) gradient: tensor(1.) since x is a…
volperossa
  • 1,339
  • 20
  • 33
3
votes
1 answer

How to find 3d point with minimum sum of euclidean distances to all given segments?

N segments in 3d space are given. Segment is represented by 2 points. The problem is to find the point with minimal possible sum of distances to all segments.
3
votes
1 answer

How to plot gradient descent using plotly

I have been trying to replicate some work similar to this code below but when I try to use this data from this link https://raw.githubusercontent.com/plotly/datasets/master/api_docs/mt_bruno_elevation.csv Its throwing some error. I think its because…
3
votes
0 answers

Can this nested for-loop be rewritten using tensorflow functions to allow for gradient calculation?

I wrote a function that sums only certain q-values from a tensor, those being the values corresponding to previous actions taken. I need this function to be auto-differentiable, but my current implementation uses a numpy array with nested for-loops,…
3
votes
3 answers

SGDRegressor() constantly not increasing validation performance

The model fit of my SGDRegressor wont increase or decrease its performance on the validation set (test) after around 20'000 training records. Even if I try to switch penalty, early_stopping (True/False) or alpha,eta0 to extremely high or low levels,…
3
votes
2 answers

Understanding Gradient Tape with mini batches

In the below example taken from Keras documentation, I want to understand how grads is computed. Does the gradient grads corresponds to the average gradient computed using the batch (x_batch_train, y_batch_train)? In other words, does the algorithm…
3
votes
2 answers

How to write cost function formula from Andrew Ng assignment in Octave?

My implementation (see below) gives the scalar value 3.18, which is not the right answer. The value should be 0.693. Where does my code deviate from the equation? Here are the instructions to solve for the data to run the cost function method in…
3
votes
1 answer

What is the difference between clipnorm and clipval on Keras

What is the difference between clipnorm and clipval. Ex: opt = SGD(lr=0.01, momentum=0.9, clipnorm=1.0)
3
votes
1 answer

Find minimum return value of function with two parameters

I have an error function, and sum of all errors on self.array: #'array' looks something like this [[x1,y1],[x2,y2],[x3,y3],...,[xn,yn]] #'distances' is an array with same length as array with different int values in it def calcError(self,n,X,Y):…
Q29
  • 31
  • 2