As far as I understand it's impossible for an agent to learn to avoid dynamic obstacles or to reach dynamic goals because after the training period the agent follows a static policy which describes what action to execute for each state.
I have implemented a simple grid maze and proved my assumption to be right. For now, I use tabular q-learning. But I don't think that deep q-learning would do better.
Do you have any ideas on how to overcome this problem and learn how to avoid dynamic obstacles and reach dynamic goals?