I learnt gradient descent through online resources (namely machine learning at coursera). However the information provided only said to repeat gradient descent until it converges.
Their definition of convergence was to use a graph of the cost function relative to the number of iterations and watch when the graph flattens out. Therefore I assume that I would do the following:
if (change_in_costfunction > precisionvalue) {
repeat gradient_descent
}
Alternatively, I was wondering if another way to determine convergence is to watch the coefficient approach it's true value:
if (change_in_coefficient_j > precisionvalue) {
repeat gradient_descent_for_j
}
...repeat for all coefficients
So is convergence based on the cost function or the coefficients? And how do we determine the precision value? Should it be a % of the coefficient or total cost function?