I'm attempting to first train a PPOTrainer for 250 iterations on a simple environment, and then finish training it on a modified environment. (The only difference between the environments would be a change in one of the environment configuration parameters).
So far I have tried implementing the following:
ray.init()
config = ppo.DEFAULT_CONFIG.copy()
config["env_config"] = defaultconfig
trainer = ppo.PPOTrainer(config=config, env=qsd.QSDEnv)
trainer.config['env_config']['meas_quant']=1
for i in range(250):
result = trainer.train()
#attempt to change the parameter 'meas_quant' from 1 to 2
trainer.config['env_config']['meas_quant'] = 2
trainer.workers.local_worker().env.meas_quant = 2
for i in range(250):
result = trainer.train()
However, the second training still uses the initial environment configuration. Any help in figuring out how to fix this would be greatly appreciated!