Questions about Q-learning in a 2D maze

Question

I just read about Q-learning and I'm not sure if I understand this correctly. All examples I saw are rat-in-a-maze, where the rat must move towards the cheese, and the cheese doesn't move.

I'm just wondering if it's possible to do Q-learning in a situation where both the mouse and the cheese move (so one agent chases and the other runs away).

If Q-learning doesn't work in that situation, do we have any other algorithms (greedy or non-greedy) that work?

Also is there a formal/academic name the situation? I'd like to search for papers that talks about this but can't find its formal/academic name.

Thank you so much!

Ideally, if you include some parameters of the target (cheese) like co-ordinates, relative distance, etc. in the state of the Markov Decision Process (MDP), then it might be possible to use Q-Learning to learn to chase the target. — nsidn98, Dec 10 '19 at 16:35

score 0 · Answer 1 · answered Jun 26 '21 at 07:48

All RL algorithms enable a single agent to learn a policy. In problems that involve multiple actors such as a mouse and a cheese, one actor (the mouse) would learn a policy using an RL algorithm and the other actor (the cheese) would be guided by some AI that is not RL. If both the mouse and cheese are RL agents, then you're looking at multiagent RL. Here is a nice framework for it: https://github.com/PettingZoo-Team/PettingZoo/

Q-learning is probably the most popular RL technique for beginners, but can only solve very simple toy problems with a discrete state space, such as a 2D maze. It is not very effective in addressing problems with a continuous state space, even simple ones, such as the Cartpole. It might solve them but would take much longer than other RL methods. Q-learning combined with a neural network, however, can be very powerful, as demonstrated by RL methods such as deep Q-network (DQN) and double DQN.

Questions about Q-learning in a 2D maze

1 Answers1