I am writing a simple (gradient descent) code for linear regression with multi variables data set, my problem was that when I was testing the code I noticed that the cost still decreasing after 5 million iterations which means that my learning rate is small, I tried to increase it but I got overflow for the cost value, then when I normalized the data the problem had been solved and I could increase my learning rate without getting any error, I was wondering what is the relation between normalization and overflow for the cost.
gradient descent without normalization (small learning rate)
data without normalization (bigger learning rate)