Several dips in accumulated episodic rewards during training of a reinforcement learning agent

Question

Hi I am training reinforcement learning agents for a control problem using PPO algorithm. I am tracking the accumulated rewards for each episode during the training process. Several times during the training process I see a sudden dip in the accumulated rewards. I am not able to figure out why this is happening or how to avoid this. Tried with changing some of the hyper parameters like changing the number of neurons in the neural network layers, learning rate etc.. but still I see this happening consistently. If I debug and check the actions that are being taken during dips, obviously actions are very bad hence causing a decrease in rewards.

Can some one help me with understanding why this is happening or how to avoid this ?

Some of plots of my training process

score 1 · Answer 1 · answered Nov 25 '19 at 15:23

I recently read this paper: https://arxiv.org/pdf/1805.07917.pdf I haven't used this method in particular, so I can't really vouch for the usefulness, but the explanation to this problem seemed convincing to me:

For instance, during the course of learning, the cheetah benefits from leaning forward to increase its speed which gives rise to a strong gradient in this direction. However, if the cheetah leans too much, it falls over. The gradient-based methods seem to often fall into this trap and then fail to recover as the gradient information from the new state has no guarantees of undoing the last gradient update.

Several dips in accumulated episodic rewards during training of a reinforcement learning agent

1 Answers1