Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.


Tag usage:

Questions on should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.


Read more:

1428 questions
-1
votes
1 answer

optimal values of a,b,c,d,e based on data passed to a function with known ground truth values

I have a function which takes several parameters - a,b,c,d,e and the returns the computed value of z. I also have the ground truth value of z and I would like to compute the optimal parameters of a,b,c,d,e which would minimize the error between the…
Ash
  • 59
  • 1
  • 6
-1
votes
2 answers

Gradient Descent Algorithm, Gradient stepping function

I am trying to understand the Gradient Descent Algorithm. The code here should choose a more optimal line of best fit, given another line of best fit. The function takes the current line-of-best-fit's slope and y-intercept as inputs, as well as a…
-1
votes
1 answer

In multi-class logistic regression, does SGD one training example update all the weights?

In multi-class logistic regression, lets say we use softmax and cross entropy. Does SGD one training example update all the weights or only a portion of the weights which are associated to the label ? For example, the label is one-hot [0,0,1] Does…
-1
votes
1 answer

Using conjugate gradient in pose-estimation

graphical representation of my pipeline I'm trying to get a pose estimation of a face captured from a video feed. I use a face alignement algorithm of tracking.js. This give me points which I then use to try to estimate the position of my face in…
-1
votes
1 answer

Accuracy on training data in Gradient boosting classifier- scikit

I am training GBC. It is multi class classifier with 12 classes of outputs. My issue is I am not getting 100% accuracy when i predict on the train data. In fact, misprediction happens on dominant set of classes. (my input is imbanalanced and i do…
-1
votes
1 answer

How to get the minimum cross section of a 3D volumn

Now there is a 3D volume stored in 3d array, a point is given in the 3d volume. How to calculate the minimum cross section of the volume through the point?
Silver0427
  • 63
  • 5
-1
votes
2 answers

Why that I have written for Andrew Ng's course not accepted?

Andrew Ng's course in Coursera, which Stanford's Machine Learning course, features programming assignments that deal with implementing the algorithms taught in class. The goal of this assignment is to implement linear regression through gradient…
user4417148
-1
votes
1 answer

Relationship logistic regression and Stochastic gradient descent In Formula

Considering in Formula term, from my opinion that SGD apply to final result of Logistic regression. Not sure whether correct or not just wondering the relationship between stochastic gradient descent and logistic regression. I m guessing it works…
-1
votes
2 answers

How to implement the remaining 'for loop' in this neural network in TensorFlow

I am trying to get a neural network going in TensforFlow. The dataset is simply length and width of a flower petal and the output can be either 1/0 depending on type: x = [[3,1.5], [2,1], [4,1.5], [3,1], [3.5,0.5], …
-1
votes
1 answer

what does parameters = int(theta.ravel().shape[1]) mean?

Can someone explain that code for me? def gradientDescent(X, y, theta, alpha, iters): temp = np.matrix(np.zeros(theta.shape)) parameters = int(theta.ravel().shape[1]) cost = np.zeros(iters) for i in range(iters): error = (X *…
BorkoP
  • 332
  • 3
  • 14
-1
votes
1 answer

TypeError: only length-1 arrays can be converted to Python scalars Dot Product

Writing this algorithm for my final year project. Debugged a few, but stuck on this. Tried changing the float method but nothing really changed. ----> 8 hypothesis = np.dot(float(x), theta) TypeError: only length-1 arrays can be converted to…
capncook
  • 75
  • 1
  • 9
-1
votes
2 answers

Weights becoming "NaN" in implementation of Neural Networks

I am trying to implement Neural Networks for classifcation having 5 hidden layers, and with softmax cross entropy in the output layer. The implementation is in JAVA. For optimization, I have used MiniBatch gradient descent(Batch size=100, learning…
-1
votes
1 answer

Gradient descent - can I draw function that I will minimize? Linear regression

I'm new in machine learning. I started from linear regression with gradient descent. I have python code for this and I understad this way. My question is: Gradient descent algorithm minimize function, can I plot this function? I want to see what the…
lukassz
  • 3,135
  • 7
  • 32
  • 72
-1
votes
1 answer

How backpropagation through gradient descent represents the error after each forward pass

In Neural NEtwork Multilayer Perceptron, I understand that the main difference between Stochastic Gradient Descent (SGD) vs Gradient Descent (GD) lies in the way of how many samples are chosen while training. That is, SGD iteratively chooses one…
-1
votes
1 answer

Decrement learning rate in error back propagation algorithm

This is more or less general question, in my implementation of backpropagation algorithm, I start from some "big" learning rate, and then decrease it after I see the error started to grow, instead of narrowing down. I am able to do this rate…