I try to use Stable Baseliens train a PPO2 with MlpPolicy. After 100k timesteps, I can only get 1 and -1 in action. I restrict action space to [-1, 1] and directly use action as control. I don't know if it is because I directly use action as control?
Asked
Active
Viewed 150 times
0
-
What environment are you training it on? – nsidn98 Nov 23 '20 at 11:24
-
@qwererer did you solve this problem? – Fatemeh Karimi Dec 25 '20 at 06:58
1 Answers
0
This could be the result of the gauß distribution PPO2 is using. You could use a different algorithm that doesn't use gauß or use PPO with another distribution.
Checkout the example here: https://github.com/hill-a/stable-baselines/issues/112 And this paper: https://www.ri.cmu.edu/wp-content/uploads/2017/06/thesis-Chou.pdf

Nico Bohlinger
- 33
- 4