Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.


Tag usage:

Questions on should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.


Read more:

1428 questions
0
votes
1 answer

Is my gradient checking method correct and my gradient calculation wrong, or vice versa?

My network only achieve around 80%, but the reported best score is around 85% accuracy. I m using same input data and same initalization. I dont know whats wrong, so I try to check my gradients and implemented what is recommended for gradient…
0
votes
2 answers

What does learning algorithm output in linear regression?

Reading course notes of Andrew NG's machine learning course it states for linear regression : Take a training set and pass it into a learning algorithm. The algorithm outputs a function h (the hypothesis). h takes an input and tries to output…
blue-sky
  • 51,962
  • 152
  • 427
  • 752
0
votes
1 answer

Comparing the accuracies of different numerical gradients?

Suppose that I have a function double f(vector &x) { // do something with x return answer; } Mathematically, f is a continuous function with respect to each component of x. Now I want to evaluate the numerical gradient of x. There are…
Hieu Pham
  • 97
  • 1
  • 8
0
votes
0 answers

Python implementation of Gradient Descent Algorithm isn't working

I am trying to implement a gradient descent algorithm for simple linear regression. For some reason it doesn't seem to be working. from __future__ import division import random def error(x_i,z_i, theta0,theta1): return z_i - theta0 - theta1 *…
razalfuhl
  • 3
  • 1
0
votes
1 answer

Removing infinity values of a function using exception handling, *args, and **kwargs

I'm currently working through the book Data Science from Scratch by Joel Grus, and I've run across a function that I don't really understand: def safe(f): def safe_f(*args, **kwargs): try: return f(*args, **kwargs) …
0
votes
1 answer

Octave index out of bounds

The program is an optimized gradient descent. Here is the code : clear all close all [x,y] = meshgrid(-2:0.1:2); z = x.^2 + 100*y.^2; n = 1; …
avers
  • 17
  • 1
  • 3
0
votes
2 answers

calculating loss on GradientBoostingClassifier in python during the run

I have the following code for creating and training a sklearn.ensemble.GradientBoostingClassifier class myMonitor: def __call__(self, i, estimator, locals): proba = estimator.predict_proba(Xp2) myloss = calculateMyLoss(proba,…
WhiteTiger
  • 758
  • 1
  • 7
  • 21
0
votes
1 answer

Perceptron learns to reproduce just one pattern all the time

This is rather a weird problem. A have a code of back propagation which works perfectly, like this: Now, when I do batch learning I get wrong results even if it concerns just a simple scalar function approximation. After training the network…
0
votes
1 answer

Non - Linear, Non-Smooth Optimization in Excel

I am trying to solve a non-linear, non-smooth optimization in excel. Both GRG and Evolutionary algorithms are not able to give reasonable results(they are not converging in certain cases). The number of constraints are within the excel recommended…
Alvi John
  • 81
  • 3
  • 3
  • 13
0
votes
1 answer

My RPROP neural network gets stuck

Since the implementation of the algorithm is correct(i checked it hundreds of times), I think I have misunderstood some theorical facts. I suppose that: given that j refers to the hiddenlayer side and k to the output layer ∂E/∂wjk is calculated by…
0
votes
1 answer

How to solve logistic regression using gradient Descent?

I was solving a exercise of a online course form coursera on machine learning. The problem statement is : Suppose that a high school has a dataset representing 40 students who were admitted to college and 40 students who were not admitted. Each (…
0
votes
2 answers

Loop inside another loop in R

I have a problem with results of loop in loop function. It counts inside loop only once and choose the best solution for the first raw and then stop. I would like to remember the best solution for every row of the matrix zmienne. What am I doing…
user3463225
  • 401
  • 9
  • 19
0
votes
1 answer

Apache Spark - org.apache.spark.SparkException: Task not serializable

When attempting to run my method: def doGD() = { allRatings.foreach(rating => gradientDescent(rating)); } I get the error: org.apache.spark.SparkException: Task not serialisable I understand that my method of Gradient Descent is not…
monster
  • 1,762
  • 3
  • 20
  • 38
0
votes
1 answer

Stochastic Gradient Descent Convergence Criteria

Currently my convergence criteria for SGD checks whether the MSE error ratio is within a specific boundary. def compute_mse(data, labels, weights): m = len(labels) hypothesis = np.dot(data,weights) sq_errors = (hypothesis - labels) ** 2 …
indecisivecoder
  • 227
  • 1
  • 3
  • 10
0
votes
2 answers

Unfolding Constant Amount Of Arguments

I want to write a simple gradient descent for a function with 5 parameters on C++. Now I stumble upon a implementation model problem: Should I sacrifice speed for folding my arguments in vector/arrays. Here is what I mean. I can implement funcition…