Is it possible to consider the output of one neural network as two or more sets of outputs ?
I explain myself a bit more (in a q learning context):
Imagine i have two agents in the same environement and each agents have a different amount of performable actions. Both of the agents will have the same input vector containing environnemental variables to chose their actions.
The question is :
Can I use a unique neural network to control both agents ?
One exemple:
Agent 1 have 3 performable actions and Agent 2 have only 2 performable actions. An important thing is that the agent will have to work cooperatively to maximize the reward. Can i use 1 neural network with 5 outputs to chose the best action to do for both agents ? like the first 3 outputs of the network will be the Q values for the first agent and the 2 others will be the Q values for agent 2. My reward function will always be based on the global results, each agents will not have specific reward.
Is it possible ? Because i didn't find anything talking about that. If you need more precisions just ask.
I also know that a possible solution should be to make a network with 3 * 2 outputs and each output would be a couple of actions (1 action for each agent), but i really want to know if someone already did someone like i explained before or just if someone know that can't work and why.