Can you limit the number of actions when using q learning?

Question

I am currently implementing q learning to solve a maze which contains fires which initiate randomly. Would it be considered proper for me to code the action to not be an option for the agent if there is a fire in that direction or should my reward be doing this instead? Thanks

score 0 · Answer 1 · answered May 26 '22 at 10:23

TL;DR: It is absolutely okay to restrict actions.

The available actions can be state-dependent. This can be given by physical limitations (no possibility to enter the wall). A radical example of this is the application of RL to movement on a graph (see this: https://education.dellemc.com/content/dam/dell-emc/documents/en-us/2020KS_Nannapaneni-Optimal_path_routing_using_Reinforcement_Learning.pdf).

Additionally, you can restrict your actions even if they are allowed (e.g. physically possible) by designing the policy. In case of probabilistic policy, you can set the "fire" actions to have a probability zero.

For deeper reading: https://arxiv.org/pdf/1906.01772.pdf

Thank you for this. Relatively new to the field and exploring the papers you provided was a great help. — Shabir, May 27 '22 at 02:06

Can you limit the number of actions when using q learning?

1 Answers1