I'm starting to play around with https://github.com/openai/baselines/, specifically the deepq algorithm. I wanted to do my own analysis of the parameters passed into the deepq.learn method.
The method has two parameters related to exploration - exploration_fraction
and exploration_final_eps
.
The way I understand it - exploration_fraction
determines how much of the training time does the algorithm spend exploring, and exploration_final_eps
drives the probability of taking a random action each time explores. So - the number of random actions taken for the sake of exploring is a product of exploration_fraction
and exploration_final_eps
. Is that correct?
Can someone provide an explanation (in layman terms) of how the algorithm explores, based on these two parameters?