-4

In gradient descent, we adjust weights to reach global minima of error. But, the hyperplane of gradient descent shows a boat like structure, which means after the error reaches its minimum value, it increases again to create the boat like structure. But, while executing the code,after certain epochs, the error doesn't go up after reaching its minimum value, it remains same. Can you pls. clarify.

3 Answers3

0

It could be due to a constant decrease in the learning rate. By the time you have found the optimal solution, your learning rate has decreased significantly. Any new changes made at this point are insignificant, hence it appears that the error is not going up.

Umair Javaid
  • 421
  • 2
  • 7
  • 22
0

The hyperplane of gradient descent can be visualized as a parabola and we try to minimize a function, we try to go towards the vertex, which is the minimum point and our objective. Imagine a ball rolling in a valley, No matter what happens, after some time the ball will stop at the center of the valley, where it is the lowest. This is gradient descent. Even though the ball can go up, it has no energy. This is directly translated to our steps that minimize loss.

Niteya Shah
  • 1,809
  • 1
  • 17
  • 30
0

You really need to study basic differentiation to understand why this is happening. But, I'll try explaining To put it simply,

Assumptions let us assume that the cost will be minimised if the weight's value reaches 0.

case 1: w = -5 assume that one of the weight's value is -5. now, if you differentiate the cost with respect to the weight, you are probably gonna get a negative number. you subtract the previous value of weight with some small quantity of this gradient you got. -5 -(some small negative value) will be closer to 0 and hence the weight slowly reaches the function's global minimum

case 2: Now, coming to your question. what happens when the weight's value goes above 0. let's say, now the value of the weight is 2. if you take the derivative of cost with respect to the weight,you are going to get a positive value this time. that's because of the slope of the function at different points (calculus concepts). since, you subtract 2 with some small positive number, it's gonna get closer to 0. so, no matter which side of the global minimum the weight is. once you differentiate it and subtract with the previous value of the weight, it will always move towards the minimum.

Ashwin Prasad
  • 108
  • 1
  • 6