1

im going to implement a Reinforcement learning PPO algorithm on a problem that requires a decision that is composed by 4 continuous variables: [a1,a2,a3,a4]. For this, I want to create an actor that can provide the mean and variance of a Gaussian Distribution for each action. So I was wondering how does my neural network should be designed?

I had the idea of a simple NN with 3 layers of 100 nodes that outputs 8 values: input->L1->L2->L3->8outputs from which the first paisr would correspond to the first action, the second pair to the second action and so on... But I considered that this doesnt sound too systematic so I also considered a NN where the last layer contains 4 outputs and then I make two outputs from each of these four outputs: input->L1->L2->4outputs->(2,2,2,2)one for each. Is there any difference on these approaches? If so, what are they?

  • Dear Daniel, I am not sure of why would you do the first approach. If you only need a real number, one output should be enough. One thing I would take care is not to use softmax in your output layer as it will change the output to probabilities and the sum will be 1. – Israel Zinc Nov 23 '22 at 23:08

0 Answers0