Save episode rewards in ray.tune

Question

I am training several agents with PPO algorithms in a multi-agent environment using rllib/ray. I am using the ray.tune() command to train the agents and then loading the training data from ~/ray_results. This data contains the actions chosen by the agents in each training episode, but I also need the corresponding agent rewards. I've looked at the documentation, but there doesn't seem to be configuration argument that allows for saving episode rewards. Does anyone have a workaround for this?

score 1 · Answer 1 · answered Jul 12 '21 at 09:13

1

You need to add these values into the info dict, then it will get collected by ray tune.

answered Jul 12 '21 at 09:13

Rocket

1,030
5
24

score 0 · Answer 2 · answered Jul 01 '21 at 22:00

0

Did you check progress.csv and result.json? The details of the reward for each agent in every episode can be found there.

answered Jul 01 '21 at 22:00

vwaq

11
1

score 0 · Answer 3 · answered Jul 02 '21 at 08:35

The episode reward result.json is by default the sum of mean of all agent rewards per episode and policy reward will be the mean of all agent rewards assigned to that policy. Example for 2 agents:

"hist_stats": {
    "episode_reward": [527.0, 399.0, 165.0, 8.0, 268.0, 138.0, 154.0, 846.0],
    "episode_lengths": [50, 50, 50, 50, 50, 50, 50, 50],
    "policy_0_reward": [0.0, 0.0, 0.0, 8.0, 240.0, 138.0, 0.0, 0.0],
    "policy_1_reward": [527.0, 399.0, 165.0, 0.0, 28.0, 0.0, 154.0, 846.0]
},

But, what you could do is,change the summarize_episodes function accordingly in metrics.py

Save episode rewards in ray.tune

3 Answers3