I just read about Q-learning and I'm not sure if I understand this correctly. All examples I saw are rat-in-a-maze, where the rat must move towards the cheese, and the cheese doesn't move.
I'm just wondering if it's possible to do Q-learning in a situation where both the mouse and the cheese move (so one agent chases and the other runs away).
If Q-learning doesn't work in that situation, do we have any other algorithms (greedy or non-greedy) that work?
Also is there a formal/academic name the situation? I'd like to search for papers that talks about this but can't find its formal/academic name.
Thank you so much!