2

I am currently implementing q learning to solve a maze which contains fires which initiate randomly. Would it be considered proper for me to code the action to not be an option for the agent if there is a fire in that direction or should my reward be doing this instead? Thanks

Shabir
  • 21
  • 1

1 Answers1

0

TL;DR: It is absolutely okay to restrict actions.

The available actions can be state-dependent. This can be given by physical limitations (no possibility to enter the wall). A radical example of this is the application of RL to movement on a graph (see this: https://education.dellemc.com/content/dam/dell-emc/documents/en-us/2020KS_Nannapaneni-Optimal_path_routing_using_Reinforcement_Learning.pdf).

Additionally, you can restrict your actions even if they are allowed (e.g. physically possible) by designing the policy. In case of probabilistic policy, you can set the "fire" actions to have a probability zero.

For deeper reading: https://arxiv.org/pdf/1906.01772.pdf

Karel Macek
  • 1,119
  • 2
  • 11
  • 24
  • Thank you for this. Relatively new to the field and exploring the papers you provided was a great help. – Shabir May 27 '22 at 02:06