Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.


Tag usage:

Questions on should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.


Read more:

1428 questions
0
votes
4 answers

Gradient Ascent convergence

I am trying to maximise the log of an objective function by the gradient ascent procedure. I am observing an objective value sequence in which the values first increase and then start decreasing again. I wanted to know if this is a possibility ?…
Dynamite
  • 341
  • 2
  • 5
  • 17
0
votes
4 answers

genetic algorithms: name the piece that drives the mutation location

To set up my question, allow me to start with an example: Suppose a set of 1000 arrays (aka, row vectors) all the same length. Each is filled with random numbers between -1 and 1. I then pull 500 of those row vectors at random and sum them up. I now…
Brannon
  • 5,324
  • 4
  • 35
  • 83
0
votes
1 answer

How to predict using gradient descent

Based on below data I'm attempting to use gradeint descent to predict what tags will be associated with a new user. Note the numbers are for illustravite purposes only, in actuality these numbers correspond to words. username tags title …
blue-sky
  • 51,962
  • 152
  • 427
  • 752
-1
votes
2 answers

Back propagation algorithm

I found an example online which contains a method that back propagates the error and adjusts the weights. I was wondering how this exactly works and what weight update algorithm is used. Could it be gradient descent? /** * all output…
-1
votes
1 answer

Gradient Descent Algorithm

I recently implemented a gradient descent code for linear regression. But when I'm increasing the number of iterations, I'm instead getting increasing values of 'w' and 'c' proportional to the number of iterations. Can anyone please tell me where…
-1
votes
1 answer

Simple gradient descent in python and numpy

I am trying to implement a simple gradient descent in python with only using numpy but something is missing and I can not find it. I have done it again in the past but someway I have been staring at this problem the past day without being able to…
-1
votes
1 answer

Linear regression and gradient descent from scratch python

I am trying to run the following linear regression from scratch code. When I create my object for my linear regression class and call my method, I am getting a type error. import numpy as np import pandas as pd import matplotlib.pyplot as plt df =…
-1
votes
1 answer

How does TensorFlow perform algorithms so fast?

I wanted to solve a linear regression problem by creating by myself the Adam optimization algorithm and performing it on my dataset. However, the problem is solved with an acceptable loss in around 100 epochs, while a tenth than my loss is computed…
-1
votes
1 answer

Gradient descent in matlab work but in python not work

Matlab version For the contour plotting [x1,x2\] = meshgrid(-30:0.5:30, -30:0.5:30); F = (x1-2).^2 + 2\*(x2 - 3).^2; figure; surf(x1,x2,F); hold on; contour(x1,x2,F); figure; contour(x1,x2,F,20); hold on; For initialize the value of the matrix…
-1
votes
1 answer

Neural network not learning. Its accuracy always stays constant. What do I need to change to make it learn?

I am coding a classifier neural network from scratch. It is not really learning and I believe that somewhere there is a gradient explosion/vanishing issue. Could be some other stuff as well that I cannot imagine right now. I have coded my own 2000…
-1
votes
1 answer

numpy matmul is very very slow

I'm trying to implement the Gradient descent method for solving $Ax = b$ for a positive definite symmetric matrix $A$ of size about $9600 \times 9600$. I thought my code was relatively simple #Solves the problem Ax = b for x within epsilon tolerance…
-1
votes
1 answer

how to prevent gradient vanishing in deep learning?

I train a CNN model using convolution layers from Resnet50(I freeze all layer in Resnet50). But the loss won't change during epochs running. I think this call gradient vanishing. I am new to Deep Learning so I want to hear from you guy how can we…
-1
votes
1 answer

Is this phenomenon a Gradient Vanish/Explode?

I'm training a deep neural network with N hidden layers. But I found that both train and test accuracy got worse when N becomes large (which means more hidden layers) As I know, when neural network becomes deeper, model's performance may become…
-1
votes
1 answer

Partial Derivative term in the Gradient Descent Algorithm

I'm learning the "Machine Learning - Andrew Ng" course from Coursera. In the lesson called "Gradient Descent", I've found the formula a bit complicated. The theorem is consist of "partial derivative" term. The problem for me to understand the…
Tanjim
  • 3
  • 1
  • 6
-1
votes
1 answer

how is the optmization done when we use zero_grad() in PyTorch?

zero_grad() method is used when we want to "conserve" RAM with massive datasets. There was already an answer on that, here : Why do we need to call zero_grad() in PyTorch?. Gradients are used for the update of the parameters during back prop. But if…