I am currently implementing q learning to solve a maze which contains fires which initiate randomly. Would it be considered proper for me to code the action to not be an option for the agent if there is a fire in that direction or should my reward be doing this instead? Thanks
Asked
Active
Viewed 189 times
1 Answers
0
TL;DR: It is absolutely okay to restrict actions.
The available actions can be state-dependent. This can be given by physical limitations (no possibility to enter the wall). A radical example of this is the application of RL to movement on a graph (see this: https://education.dellemc.com/content/dam/dell-emc/documents/en-us/2020KS_Nannapaneni-Optimal_path_routing_using_Reinforcement_Learning.pdf).
Additionally, you can restrict your actions even if they are allowed (e.g. physically possible) by designing the policy. In case of probabilistic policy, you can set the "fire" actions to have a probability zero.
For deeper reading: https://arxiv.org/pdf/1906.01772.pdf

Karel Macek
- 1,119
- 2
- 11
- 24
-
Thank you for this. Relatively new to the field and exploring the papers you provided was a great help. – Shabir May 27 '22 at 02:06