I have successfully used Q-learning to solve some classic reinforcement learning environments from OpenAI Gym (i.e. Taxi, CartPole). These environments allow for a single action to be taken at each time step. However I cannot find a way to solve problems where multiple actions are taken simultaneously at each time step. For example in the Roboschool Reacher environment, 2 torque values - one for each axis - must be specify at each time step. The problem is that the Q matrix is built from (state, action) pairs. However, if more than one action are taken simultaneously, it is not straightforward to build the Q matrix.
The book "Deep Reinforcement Learning Hands-On" by Maxim Lapan mentions this but does not give a clear answer, see quotation below.
Of course, we're not limited to a single action to perform, and the environment could have multiple actions, such as pushing multiple buttons simultaneously or steering the wheel and pressing two pedals (brake and accelerator). To support such cases, Gym defines a special container class that allows the nesting of several action spaces into one unified action.
Does anybody know how to deal with multiple actions in Q learning?
PS: I'm not talking about the issue "continuous vs discrete action space", which can be tackled with DDPG.