I'm studying about Neural network and I have some questions about the theory of Gradient descent.
-
I’m voting to close this question because it is not about programming. – TylerH Sep 21 '21 at 13:26
1 Answers
Everything is related to the compromise between exploitation and exploration.
Gradient Descend uses all the data to update the weights which implies a better update. In neural networks Batch Gradient Descend is used because the original is not applicable to practice. Instead Stochastic Gradient Descend only uses a single example and that adds noise. With BGD and GD you exploit more data.
With SGD you can avoid minimum locations because by using a single example you benefit the exploration and you can come up with other solutions that with BGD you could not, that implies noise. SGD you explore more.
BGD, takes a dataset and breaks it into N chunks where each chunk has B samples (B is the batch_size). That forces you to go through all the data in the dataset.

- 904
- 1
- 9
- 19