Maximum Q-values in practical scenario?

Question

Q-learning is a very simple to thing to implement and can be easily applied to explore and solve various environments or games. But as the complexity of the states increase and no. of possible actions increase, the practicality of Q-learning decreases.

Supposing I have a game (let's take driving a car in GTA as an example) for which I am feeding states as pre-processed frames and asking it to take some action. But here, two problems arise:-

The no. of Q-values increase as there are a lot of unique states with corresponding "high" reward actions.
The states values will also comprise of a sizable array as these are all pixel value, thus they become very bulky.

Thus, if we are faced with multiple Q-values and big 'state' values, then it would take some time for the agent to compare which state it is in and then take the action, by which we would have transitioned in a new state (Speed is a very important factor in this)

So, how would we solve this scenario? I think we can use maybe Monte Carlo for this but it might also take time. So is there any other solution/algorithm to solve it? Or I can actually use Q-learning in this scenario? Or maybe I should just get a DDR5 RAM and call it a day? I am on DDR3 right now BTW ;)

Any help or guidance?

score 1 · Accepted Answer · answered Mar 06 '20 at 18:38

Since you are dealing with a high number of states in your environment, likely you should consider in using some kind of function approximation instead of using a tabular representation for the Q-values.

In many real-world problems, storing all Q-values in a table is not practical for several reasons. From Sutton and Barto's book:

The problem is not just the memory needed for large tables, but the time and data needed to fill them accurately. In other words, the key issue is that of generalization. [...] Function approximation is an instance of supervised learning, the primary topic studied in machine learning, artificial neural networks, pattern recognition, and statistical curve fitting. In principle, any of the methods studied in these fields can be used in reinforcement learning as described in this chapter.

In this case, Q-learning is not that very simple to implement, although the working principles remain the same.

Maximum Q-values in practical scenario?

1 Answers1