Given a trained environment, how do I evaluate the policy at a specific state?

Question

I've trained a Ray-RLlib PPOTrainer on a custom environment. How do I evaluate the policy at a specific state?

Full example:

from ray.rllib.agents.ppo import PPOTrainer
from cust_env.envs import CustEnv
from ray.tune.logger import pretty_print

ray.init()
config = ppo.DEFAULT_CONFIG.copy()
config["num_workers"] = 2
config["eager"] = False
config["output"] = 'tmp/debug/'
trainer = PPOTrainer(config=config, env=TravelEnv)

# Can optionally call trainer.restore(path) to load a checkpoint.

for i in range(101):

   result = trainer.train()

   if i % 10 == 0:
       print(pretty_print(result))

Is there a way, something like the following, in which I can return the optimal action at a given state?

policy = trainer.get_policy()
optimal_action_at_state_S = policy.get_optimal_action(S)

The function policy.compute_actions( ) appears to return a random sample from the stochastic policy, not an optimal action.

If you edited your answer to include a minimal code example to reproduce the behavior you want, including imports, you are more likely to get a comprehensive answer. — user2653663, Dec 19 '19 at 13:41

score 2 · Answer 1 · answered Dec 25 '19 at 20:05

According the Ray developer I got in touch with via the Ray-dev Google group, the only way to accomplish this currently is to use a custom action distribution in which the variance is set to zero. He did note, however, that an improved interface will be added soon.

score 0 · Answer 2 · answered Nov 04 '21 at 09:04

0

maybe just to complement, as i bumped into this thread: option is available now https://docs.ray.io/en/latest/_modules/ray/rllib/policy/policy.html e.g. compute_actions -> set explore=False to obtain deterministic action

answered Nov 04 '21 at 09:04

Bjoern

1

Given a trained environment, how do I evaluate the policy at a specific state?

2 Answers2