0

Is it possible to consider the output of one neural network as two or more sets of outputs ?

I explain myself a bit more (in a q learning context):

Imagine i have two agents in the same environement and each agents have a different amount of performable actions. Both of the agents will have the same input vector containing environnemental variables to chose their actions.

The question is :

Can I use a unique neural network to control both agents ?

One exemple:

Agent 1 have 3 performable actions and Agent 2 have only 2 performable actions. An important thing is that the agent will have to work cooperatively to maximize the reward. Can i use 1 neural network with 5 outputs to chose the best action to do for both agents ? like the first 3 outputs of the network will be the Q values for the first agent and the 2 others will be the Q values for agent 2. My reward function will always be based on the global results, each agents will not have specific reward.

Is it possible ? Because i didn't find anything talking about that. If you need more precisions just ask.

I also know that a possible solution should be to make a network with 3 * 2 outputs and each output would be a couple of actions (1 action for each agent), but i really want to know if someone already did someone like i explained before or just if someone know that can't work and why.

Xeyes
  • 583
  • 5
  • 25

1 Answers1

1

I don't know about this specifically for reinforcement learning, but multi-output neural networks are very common in the literature.

If you want a single network to control both agents, it's probably a good idea to share the early stages of the network, before separating the network in two distinct branches, with a few layers in each branches.

For an example of how to deal with multiple outputs, you can check out this link.

francoisr
  • 4,407
  • 1
  • 28
  • 48
  • Thank for your answer, i'm gonna try both then ! thank for the link – Xeyes Jun 05 '19 at 13:27
  • If you're satisfied with the answer, please mark it as resolving your problem, so that this question gets resolved. – francoisr Jun 05 '19 at 13:28
  • I'll wait 1 or 2 days to see if someone has already experimented that, because i see one difference with the link u gave me and it's that i will use only one loss because of the common reward – Xeyes Jun 05 '19 at 13:40
  • If you're going to be using a single joint loss for both agent, you can actually probably conceptually consider them as a single agent that encompasses the state of both agents and then using two branches doesn't really make sense anymore. – francoisr Jun 05 '19 at 13:48
  • Yeah that is true, but that is why i will try both ideas, one with separate branches and differents rewards and losses and one as you defined it – Xeyes Jun 05 '19 at 13:53