0

I have a continuous problem and I should solve it with multi agent deep deterministic policy gradient (MADDPG). My environment has 7 states and 3 actions. the range of 2 of actions are between [0,1] and the range of one of the actions is between [1,100]. I have used sigmoid activation function for the last layer of the actor network. The algorithm seems to learn nothing and it only returns the boundary actions. for example [1, 100, 0] or [0,1,1]. and the rewards do not improve. I have used ornstein uhlenbeck noise for the exploration process.

What I have tried to do:

  1. I have exprimented a lot with my hyperparameters.
  2. I have clipped the gradients.
  3. I have used prioritized experience replay.
  4. I have target networks for both actor and critic.

but the problem has not been solved yet.

any reply or reference that can help me will be appreciated.

at-dgh
  • 1
  • 1

0 Answers0