This is more or less general question, in my implementation of backpropagation algorithm, I start from some "big" learning rate, and then decrease it after I see the error started to grow, instead of narrowing down. I am able to do this rate decrease either after I got error grow a bit (StateA), or just before it's about to grow (StateB, kind of rollback to previous "successful" state)
So the question is what is better from mathematical points of view? Or do I need to execute two parallel testing, let's say try to learn from point StateA, then point StateB both with reduced learning rate and compare which one is decreasing faster?
BTW I did't try approach from last paragraph. It's only pop up in mind during I write this question. In current implementation of algorithm I continue learning from StateA with decreased learning rate with assumptions that the decrease in learning rate is rather small to make me go back in previous direction to global minimum, if I accidentally faced only local minimum