MlpPolicy only return 1 and -1 with action spece[-1,1]

Question

I try to use Stable Baseliens train a PPO2 with MlpPolicy. After 100k timesteps, I can only get 1 and -1 in action. I restrict action space to [-1, 1] and directly use action as control. I don't know if it is because I directly use action as control?

What environment are you training it on? – nsidn98 Nov 23 '20 at 11:24 — nsidn98, Nov 23 '20 at 11:24
@qwererer did you solve this problem? – Fatemeh Karimi Dec 25 '20 at 06:58 — Fatemeh Karimi, Dec 25 '20 at 06:58

score 0 · Answer 1 · answered Jan 05 '21 at 21:16

This could be the result of the gauß distribution PPO2 is using. You could use a different algorithm that doesn't use gauß or use PPO with another distribution.

Checkout the example here: https://github.com/hill-a/stable-baselines/issues/112 And this paper: https://www.ri.cmu.edu/wp-content/uploads/2017/06/thesis-Chou.pdf

MlpPolicy only return 1 and -1 with action spece[-1,1]

1 Answers1