MADDPG does not learn anything

Asked Oct 30 '22 at 14:09

Active Oct 30 '22 at 14:09

Viewed 51 times

I have a continuous problem and I should solve it with multi agent deep deterministic policy gradient (MADDPG). My environment has 7 states and 3 actions. the range of 2 of actions are between [0,1] and the range of one of the actions is between [1,100]. I have used sigmoid activation function for the last layer of the actor network. The algorithm seems to learn nothing and it only returns the boundary actions. for example [1, 100, 0] or [0,1,1]. and the rewards do not improve. I have used ornstein uhlenbeck noise for the exploration process.

What I have tried to do:

I have exprimented a lot with my hyperparameters.
I have clipped the gradients.
I have used prioritized experience replay.
I have target networks for both actor and critic.

but the problem has not been solved yet.

any reply or reference that can help me will be appreciated.

asked Oct 30 '22 at 14:09

at-dgh

MADDPG does not learn anything

0 Answers0