Q-learning is a very simple to thing to implement and can be easily applied to explore and solve various environments or games. But as the complexity of the states increase and no. of possible actions increase, the practicality of Q-learning decreases.
Supposing I have a game (let's take driving a car in GTA as an example) for which I am feeding states as pre-processed frames and asking it to take some action. But here, two problems arise:-
- The no. of Q-values increase as there are a lot of unique states with corresponding "high" reward actions.
- The states values will also comprise of a sizable array as these are all pixel value, thus they become very bulky.
Thus, if we are faced with multiple Q-values and big 'state' values, then it would take some time for the agent to compare which state it is in and then take the action, by which we would have transitioned in a new state (Speed is a very important factor in this)
So, how would we solve this scenario? I think we can use maybe Monte Carlo for this but it might also take time. So is there any other solution/algorithm to solve it? Or I can actually use Q-learning in this scenario? Or maybe I should just get a DDR5 RAM and call it a day? I am on DDR3 right now BTW ;)
Any help or guidance?