Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.

Tag usage:

Questions on gradient-descent should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

Read more:

1428 questions

votes

2 answers

Write Custom Python-Based Gradient Function for an Operation? (without C++ Implementation)

I'm trying to write a custom gradient function for 'my_op' which for the sake of the example contains just a call to tf.identity() (ideally, it could be any graph). import tensorflow as tf from tensorflow.python.framework import function def…

python tensorflow gradient-descent

asked Aug 08 '16 at 16:10

njk

votes

2 answers

How to implement mini-batch gradient descent in python?

I have just started to learn deep learning. I found myself stuck when it came to gradient descent. I know how to implement batch gradient descent. I know how it works as well how mini-batch and stochastic gradient descent works in theory. But really…

python machine-learning neural-network deep-learning gradient-descent

asked Jul 02 '16 at 08:07

savan77

votes

1 answer

Tensorflow understanding tf.train.shuffle_batch

I have a single file of training data, about 100K rows, and I'm running a straightforward tf.train.GradientDescentOptimizer on each training step. The setup is essentially taken directly from Tensorflow's MNIST example. Code reproduced below: x =…

machine-learning tensorflow mathematical-optimization gradient-descent

asked Jun 25 '16 at 20:59

sir_thursday

5,270
12
64
118

votes

2 answers

Why my Gradient is wrong (Coursera, Logistic Regression, Julia)?

I'm trying to do Logistic Regression from Coursera in Julia, but it doesn't work. The Julia code to calculate the Gradient: sigmoid(z) = 1 / (1 + e ^ -z) hypotesis(theta, x) = sigmoid(scalar(theta' * x)) function gradient(theta, x, y) (m, n)…

gradient julia logistic-regression gradient-descent

asked May 06 '16 at 09:28

Alexey Petrushin

1,311
3
10
24

votes

1 answer

How to get a gradient node with mxnet.jl and Julia?

I'm trying to replicate the following example from the mxnet main docs with mxnet.jl in Julia: A = Variable('A') B = Variable('B') C = B * A D = C + Constant(1) # get gradient node. gA, gB = D.grad(wrt=[A, B]) # compiles the gradient function. f =…

julia gradient-descent mxnet mxnet.jl

asked Mar 07 '16 at 09:28

Bernhard Kausler

5,119
3
32
36

votes

2 answers

Gradient Descent vs Stochastic Gradient Descent algorithms

I tried to train a FeedForward Neural Network on the MNIST Handwritten Digits dataset (includes 60K training samples). I each time iterated over all the training samples, performing Backpropagation for each such sample on every epoch. The runtime is…

machine-learning computer-vision neural-network gradient-descent

asked Feb 29 '16 at 22:49

kuch11

votes

1 answer

theano hard_sigmoid() breaks gradient descent

for intents of highlighting the issue lets follow this tutorial. theano has 3 ways to compute the sigmoid of a tensor, namely sigmoid, ultra_fast_sigmoid and hard_sidmoid. It seems using the latter two breaks the gradient descent algorithm. The…

python theano gradient-descent

asked Feb 04 '16 at 01:25

user2255757

votes

2 answers

Understanding softmax classifier

I am trying to understand a simple implementation of Softmax classifier from this link - CS231n - Convolutional Neural Networks for Visual Recognition. Here they implemented a simple softmax classifier. In the example of Softmax Classifier on the…

machine-learning deep-learning gradient-descent calculus softmax

asked Aug 27 '15 at 20:23

Shubhashis

10,411
11
33
48

votes

1 answer

Spark mllib predicting weird number or NaN

I am new to Apache Spark and trying to use the machine learning library to predict some data. My dataset right now is only about 350 points. Here are 7 of those…

python apache-spark pyspark apache-spark-mllib gradient-descent

asked Jul 23 '15 at 22:53

Scot Lawrie

votes

1 answer

Neural Network Mini Batch Gradient Descent

I am working with a multi-layer neural network. I intend to do mini-batch gradient descent. Suppose I have mini-batches of 100 over 1 million data points. I don't understand the part where I have to update the weights of the whole network. When I do…

machine-learning neural-network gradient-descent

asked Aug 16 '14 at 19:23

Sasha

votes

1 answer

Multi variable gradient descent

I am learning gradient descent for calculating coefficients. Below is what I am doing: #!/usr/bin/Python import numpy as np # m denotes the number of examples here, not the number of features def gradientDescent(x, y, theta, alpha, m,…

python machine-learning linear-regression gradient-descent

asked Jun 25 '14 at 14:24

user227666

votes

2 answers

Programing Logistic regression with Stochastic gradient descent in R

I’m trying to program the logistic regression with stochastic descending gradient in R. For example I have followed the example of Andrew Ng named: “ex2data1.txt”. The point is that the algorithm works properly, but thetas estimation is not exactly…

r regression gradient-descent stochastic

asked Apr 02 '14 at 09:25

user3488416

votes

1 answer

Rescaling after feature scaling, linear regression

Seems like a basic question, but I need to use feature scaling (take each feature value, subtract the mean then divide by the standard deviation) in my implementation of linear regression with gradient descent. After I'm finished, I'd like the…

machine-learning linear-regression gradient-descent

asked Jan 16 '14 at 17:33

Cartesian Theater

1,920
2
29
49

votes

1 answer

Multi variable gradient descent in matlab

I'm doing gradient descent in matlab for mutiple variables, and the code is not getting the expected thetas I got with the normal eq. that are: theta = 1.0e+05 * 3.4041 1.1063 -0.0665 With the Normal eq. I have implemented. And with…

matlab machine-learning gradient-descent

asked Oct 20 '13 at 21:04

Pedro.Alonso

1,007
3
20
41

votes

1 answer

Why do we multiply learning rate by gradient accumulation steps in PyTorch?

Loss functions in pytorch use "mean" reduction. So it means that the model gradient will have roughly the same magnitude given any batch size. It makes sense that you want to scale the learning rate up when you increase batch size because your…

python deep-learning pytorch gradient-descent learning-rate

asked Mar 10 '23 at 22:15

off99555

3,797
3
37
49

Prev 1 2 3

…

95 96 Next