Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.


Tag usage:

Questions on should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.


Read more:

1428 questions
5
votes
2 answers

TypeError: minimize() missing 1 required positional argument: 'var_list'

I am trying to minimize the loss using SGD, but its throwing error while I am using SGD, I am trying to do it in tensorflow 2.0, one additional parameter that is causing issue is var_list import tensorflow as tf import numpy import matplotlib.pyplot…
Akshay
  • 81
  • 2
  • 2
  • 9
5
votes
1 answer

Why doesn't my custom made linear regression model match sklearn?

I'm attempting to create a simple linear model with Python using no libraries (other than numpy). Here's what I have import numpy as np import pandas np.random.seed(1) alpha = 0.1 def h(x, w): return np.dot(w.T, x) def cost(X, W, Y): …
Shamoon
  • 41,293
  • 91
  • 306
  • 570
5
votes
2 answers

computing gradients for every individual sample in a batch in PyTorch

I'm trying to implement a version of differentially private stochastic gradient descent (e.g., this), which goes as follows: Compute the gradient with respect to each point in the batch of size L, then clip each of the L gradients separately, then…
chirpchirp
  • 121
  • 2
  • 8
5
votes
2 answers

How to include a custom filter in a Keras based CNN?

I am working on a fuzzy convolution filter for CNNs. I have the function ready - it takes in the 2D input matrix and the 2D kernel/weight matrix. The function outputs the convolved feature or the activation map. Now, I want to use Keras to build the…
5
votes
1 answer

tf.gradients() sums over ys, does it?

https://www.tensorflow.org/versions/r1.6/api_docs/python/tf/gradients In the documentation for tf.gradients(ys, xs) it states that Constructs symbolic derivatives of sum of ys w.r.t. x in xs I am confused about the summing part, I have read…
5
votes
2 answers

Confused usage of dropout in mini-batch gradient descent

My question is in the end. An example CNN trained with mini-batch GD and used the dropout in the last fully-connected layer (line 60) as fc1 = tf.layers.dropout(fc1, rate=dropout, training=is_training) At first I thought the tf.layers.dropout or…
5
votes
1 answer

Difference between GradientDescentOptimizer and AdamOptimizer in tensorflow?

When using GradientDescentOptimizer instead of Adam Optimizer the model doesn't seem to converge. On the otherhand, AdamOptimizer seems to work fine. Is the something wrong with the GradientDescentOptimizer from tensorflow? import matplotlib.pyplot…
5
votes
1 answer

`warm_start` Parameter And Its Impact On Computational Time

I have a logistic regression model with a defined set of parameters (warm_start=True). As always, I call LogisticRegression.fit(X_train, y_train) and use the model after to predict new outcomes. Suppose I alter some parameters, say, C=100 and call…
5
votes
1 answer

keras loss jumps to zero randomly at the start of a new epoch

I'm training a network which has multiple losses and both creating and feeding the data into my network using a generator. I've checked the structure of the data and it looks fine generally and it also trains pretty much as expected the majority of…
tryingtolearn
  • 2,528
  • 7
  • 26
  • 45
5
votes
0 answers

How to train a model in keras with multiple input-output datasets with different batch sizes

I have a supervised learning problem that I am solving with the Keras functional API. As this model is predicting the state of a physical system, I know the supervised model should follow additional constraints. I would like to add that as an…
5
votes
1 answer

Gradient calculation in Hamming loss for multi-label classification

I am doing a multilabel classification using some recurrent neural network structure. My question is about the loss function: my output will be vectors of true/false (1/0) values to indicate each label's class. Many resources said the Hamming loss…
5
votes
0 answers

Implementing Feedback Alignment in Tensorflow

I want to implement Direct Feedback Alignemnt in Tensorflow. Reference paper: https://arxiv.org/pdf/1609.01596v5.pdf, Nøkland (2016) I implemented a simple network that does DFA in pure Python, having explicitly the backprop, I just switched the…
iacolippo
  • 4,133
  • 25
  • 37
5
votes
2 answers

Steepest descent to find the solution to a linear system with a Hilbert matrix

I am using the method of steepest descent to figure out the solution to a linear system with a 5x5 Hilbert matrix. I believe the code is fine in the regard that it gives me the right answer. My problem is that: I think it is taking too many…
5
votes
1 answer

Is there hope for using Lasagne's Adam implementation for Probabilistic Matrix Factorization?

I am implementing Probabilistic Matrix Factorization models in theano and would like to make use of Adam gradient descent rules. My goal is to have a code that is as uncluttered as possible, which means that I do not want to keep explicitly track of…
fstab
  • 4,801
  • 8
  • 34
  • 66
5
votes
1 answer

Implementing gradient descent in TensorFlow instead of using the one provided with it

I want to use gradient descent with momentum (keep track of previous gradients) while building a classifier in TensorFlow. So I don't want to use tensorflow.train.GradientDescentOptimizer but I want to use tensorflow.gradients to calculate…
prepmath
  • 69
  • 1
  • 6