Number of Q values for a deep reinforcement learning network

Question

I am currently developing a deep reinforcement learning network however, I have a small doubt about the number of q-values I will have at the output of the NN. I will have a total of 150 q-values, which personally seems excessive to me. I have read on several papers and books that this could be a problem. I know that it will depend from the kind of NN I will build, but do you guys think that the number of q-values is too high? should I reduce it?

what is "number of Q values"? Do you mean you have 150 actions to be taken, and thus for each state s there are 150 Q(a, s) to be estimated? — lejlot, Apr 23 '18 at 20:06

score 1 · Accepted Answer · answered Apr 23 '18 at 20:50

There is no general principle what is "too much". Everything depends solely on the problem and throughput one can get in learning. In particular number of actions does not have to matter as long as internal parametrisation of Q(a, s) is efficient. To give some example lets assume that the neural network is actually of form NN(a, s) = Q(a, s), in other words it accepts action as an input, together with the state, and outputs the Q value. If such an architecture can be trained in a problem considered, than it might be able to scale to big action spaces; on the other hand if the neural net basically has independent output per action, something of form NN(s)[a] = Q(a, s) then many actions can lead to relatively sparse learning signal for the model and thus lead to slow convergence.

Since you are asking about reducing action space it sounds like the true problem has complex control (maybe it is a continuous control domain?) and you are looking for some discretization to make it simpler to learn. If this is the case you will have to follow the typical approach of trial and error - try with simple action space, observe the dynamics, and if the results are not satisfactory - increase the complexity of the problem. This allows making iterative improvements, as opposed to going in the opposite direction - starting with too complex setting to get any results and than having to reduce it without knowing what are the "reasonable values".

Number of Q values for a deep reinforcement learning network

1 Answers1