How do you evaluate a trained reinforcement learning agent whether it is trained or not?

Question

I am new to reinforcement learning agent training. I have read about PPO algorithm and used stable baselines library to train an agent using PPO. So my question here is how do I evaluate a trained RL agent. Consider for a regression or classification problem I have metrics like r2_score or accuracy etc.. Are there any such parameters or how do I test the agent, conclude that the agent is trained well or bad.

Thanks

score 0 · Answer 1 · answered Oct 31 '19 at 13:37

You can run your environment with a random policy, and then run same environment with same random seed with the trained PPO model. The comparison of the accumulated rewards gives you some initial thoughts about the performance of the trained model.

Since you use PPO, you might want to check the trajectories of gradients and the KL divergence values, to see if you have well defined threshold for accepting a gradient step. If there are very few accepted gradient step, you might want to modify your parameters.

score 0 · Answer 2 · answered Feb 17 '20 at 22:55

A good way to evaluate an RL agent is to run it in the environment for N times, and calculate the average return from the N runs.

It is common to perform the above evaluation step throughout your training process, and graph the average return as training happens. You would expect the average return to go up, indicating that the training is doing something useful.

For example, in Figure 3 of the PPO paper, the authors graphed the average return with training steps, to show that PPO performs better than other algorithms.

How do you evaluate a trained reinforcement learning agent whether it is trained or not?

2 Answers2