0

I'm implementing PPO2 reinforcement learning on my self-build tasks and always encounter such situations where the agent seems to be nearly matured then suddenly catstrophically loses its performance and couldn't hold its stable performance. I don't know what's the right word for it.

I'm just wondering what could be the cause for such catastrophic drop in performance? Any hints or tips?

Many thanks

learningprocess1 learningprocess2

1 Answers1

0

I would guess that your reward function is not capped and can produce extremely high negative rewards in some edge cases.

Two things to prevent this are:

  1. Limit the values from your reward function
  2. Make sure that you can handle situations when your learning environment is unstable like the process crashed, froze, experienced a bug. For example if you give your agent negative reward when he falls (robot trying to walk) and the environment doesn't detect the fall because of some rare bug, then your reward function keeps giving negative rewards until the episode stopped.

Most of the time this is not that big of a deal but if you are unlucky your environment could even produce NaN values and those would corrupt your network