2

As far as I understand it's impossible for an agent to learn to avoid dynamic obstacles or to reach dynamic goals because after the training period the agent follows a static policy which describes what action to execute for each state.

I have implemented a simple grid maze and proved my assumption to be right. For now, I use tabular q-learning. But I don't think that deep q-learning would do better.

Do you have any ideas on how to overcome this problem and learn how to avoid dynamic obstacles and reach dynamic goals?

siva
  • 1,183
  • 3
  • 12
  • 28

0 Answers0