I am training several agents with PPO algorithms in a multi-agent environment using rllib/ray. I am using the ray.tune()
command to train the agents and then loading the training data from ~/ray_results
. This data contains the actions chosen by the agents in each training episode, but I also need the corresponding agent rewards. I've looked at the documentation, but there doesn't seem to be configuration argument that allows for saving episode rewards. Does anyone have a workaround for this?
Asked
Active
Viewed 753 times
2

mat123a
- 21
- 1
3 Answers
1
You need to add these values into the info dict, then it will get collected by ray tune.

Rocket
- 1,030
- 5
- 24
0
Did you check progress.csv
and result.json
? The details of the reward for each agent in every episode can be found there.

vwaq
- 11
- 1
0
The episode reward result.json
is by default the sum of mean of all agent rewards per episode and policy reward will be the mean of all agent rewards assigned to that policy.
Example for 2 agents:
"hist_stats": {
"episode_reward": [527.0, 399.0, 165.0, 8.0, 268.0, 138.0, 154.0, 846.0],
"episode_lengths": [50, 50, 50, 50, 50, 50, 50, 50],
"policy_0_reward": [0.0, 0.0, 0.0, 8.0, 240.0, 138.0, 0.0, 0.0],
"policy_1_reward": [527.0, 399.0, 165.0, 0.0, 28.0, 0.0, 154.0, 846.0]
},
But, what you could do is,change the summarize_episodes function accordingly in metrics.py

Vidya Ganesh
- 788
- 11
- 24