Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Wiki:

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.


Tag usage:

Questions on should be about implementation and programming problems, not about the theoretical properties of the optimization algorithm. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.


Read more:

1428 questions
0
votes
1 answer

Why is my loss function returning nan?

So I define this custom loss function in Keras using a Tensorflow backend to minimize a background extraction autoencoder. It's supposed to ensure that the prediction x_hat doesn't stray to far from the median of the predictions taken over the batch…
mdornfe1
  • 1,982
  • 1
  • 24
  • 42
0
votes
1 answer

Gradients are zeros

I try to learn a network but always get a zeros gradient. I am really confused about it and I don't have any ideas whit it happens. I have an input data in format (batch_size, 120, 10, 3) and after six layer (conv1 - pool1 - conv2 - pool2 -fc1 -…
Vladimir
  • 525
  • 5
  • 15
0
votes
1 answer

Implementation of steepest descent in Matlab

I have to implement the steepest descent method and test it on functions of two variables, using Matlab. Here's what I did so far: x_0 = [0;1.5]; %Initial guess alpha = 1.5; %Step size iteration_max = 10000; tolerance = 10e-10; % Two anonymous…
wrong_path
  • 376
  • 1
  • 6
  • 18
0
votes
1 answer

Gradient Descent Algorithm in Python

I am trying to write a gradient descent function in python as part of a multivariate linear regression exercise. It runs, but does not compute the correct answer. My code is below. I've been trying for weeks to finish this problem but have made zero…
LfB
  • 17
  • 6
0
votes
0 answers

Forcing an optimization to satisfy constraints above all

I'm currently using MATLAB's fmincon with interior-points to run this non-linear optimization problem I have. Depending on the initialization, my initial guess may already be quite close to the minimum objective value, at the same time grossly…
0
votes
1 answer

How do I rightly code linear regression with gradient descent in Python?

import pandas as pd import matplotlib.pyplot as plt # I'm trying to code the utter basic func of LinearRegression # from sklearn.linear_model import LinearRegression dataframe = pd.read_fwf('brain_body.txt') # link given below x_values =…
0
votes
1 answer

My TensorFlow Gradient Descent diverges

import tensorflow as tf import pandas as pd import numpy as np def normalize(data): return data - np.min(data) / np.max(data) - np.min(data) df = pd.read_csv('sat.csv', skipinitialspace=True) x_reading = df['reading_score'] x_math =…
0
votes
1 answer

Exclude specific tensors being updated by optimizer in TensorFlow

I have two graphs, which I suppose to train them independently, which means I have two different optimizers, but at the same time one of them is using the tensor values of the other graph. As a result, I need to be able to stop specific tensors…
saman
  • 199
  • 4
  • 17
0
votes
1 answer

Why "theta" in this code is NaN?

I'm learning neural networks (linear regression) in MATLAB for my research project and this is a part of the code I use. The problem is the value of "theta" is NaN and I don't know why. Could you tell me where is the error? function [theta,…
0
votes
0 answers

logistic regression gradient descent not converging

I am trying to implement logistic regression using gradient descent to find the weights of a multivariate function given some data. So far I have come up with the following and the gradientDescent() function works using the meanSquareError() input…
0
votes
1 answer

Cross entropy applied to backpropagation in neural network

I watched this awesome video by Dave Miller on making a neural network from scratch in C++ here: https://vimeo.com/19569529 Here is the full source code referenced in the video: http://inkdrop.net/dave/docs/neural-net-tutorial.cpp It uses mean…
0
votes
1 answer

What's the best objective function for the CartPole task?

I'm doing policy gradient and I'm trying to figure out what the best objective function is for the task. The task is the open ai CartPole-v0 environment in which the agent receives a reward of 1 for each timestep it survives and a reward of 0 upon…
0
votes
0 answers

Implementing Pre-Conditioned Conjugate Gradient with PETSc

I'm writing some C code to run OLS on a large dataset of approximately 40 million observations. I am interested in using pre-conditioned conjugate gradient algorithm (PCG) utilizing incomplete Cholesky (ICC) decomposition to obtain the…
inb
  • 1
  • 1
0
votes
1 answer

Using the delta rule in keras

I'm trying to build a linear single-layer perceptron (i.e. no hidden layers, all inputs connected to all outputs, linear activation function) and train it, one data point at a time, with the delta rule, but I don't get the results that I'm…
0
votes
1 answer

Wouldn't setting the first derivative of Cost function J to 0 gives the exact Theta values that minimize the cost?

I am currently doing Andrew NG's ML course. From my calculus knowledge, the first derivative test of a function gives critical points if there are any. And considering the convex nature of Linear / Logistic Regression cost function, it is a given…