While I was trying to figure out the reset condition of flocking env(from gym-flock), I came up with this question: can 'return False' return True in someway??
The core codes are:
1: test_model.py in https://github.com/katetolstaya/multiagent_gnn_policies#available-algorithms
def test(args, actor_path, render=True):
# initialize gym env
env_name = args.get('env')
env = gym.make(env_name)
if isinstance(env.env, gym_flock.envs.FlockingRelativeEnv):
env.env.params_from_cfg(args)
# use seed
seed = args.getint('seed')
env.seed(seed)
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
# initialize params tuple
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
learner = DAGGER(device, args)
n_test_episodes = args.getint('n_test_episodes')
learner.load_model(actor_path, device)
**for _ in range(n_test_episodes):
episode_reward = 0
state = MultiAgentStateWithDelay(device, args, env.reset(), prev_state=None)
done = False
while not done:
action = learner.select_action(state)
next_state, reward, done, _ = env.step(action.cpu().numpy())
next_state = MultiAgentStateWithDelay(device, args, next_state, prev_state=state)
episode_reward += reward
state = next_state
if render:
env.render()
print(episode_reward)
env.close()**
2: gym environment code: flocking_relative.py in https://github.com/katetolstaya/gym-flock/tree/stable/gym_flock/envs/flocking
def step(self, u):
#u = np.reshape(u, (-1, 2))
assert u.shape == (self.n_agents, self.nu)
#u = np.clip(u, a_min=-self.max_accel, a_max=self.max_accel)
self.u = u * self.action_scalar
# x position
self.x[:, 0] = self.x[:, 0] + self.x[:, 2] * self.dt + self.u[:, 0] * self.dt * self.dt * 0.5
# y position
self.x[:, 1] = self.x[:, 1] + self.x[:, 3] * self.dt + self.u[:, 1] * self.dt * self.dt * 0.5
# x velocity
self.x[:, 2] = self.x[:, 2] + self.u[:, 0] * self.dt
# y velocity
self.x[:, 3] = self.x[:, 3] + self.u[:, 1] * self.dt
self.compute_helpers()
return (self.state_values, self.state_network), self.instant_cost(), **False**, {}
For while loop in test_model.py to break and reset env, done should be True in some point. However, the code in env.step(code part 2) always return False in the place of done.
How does this loop break when env.step always return False? I have tested and confirmed that this code works fine, But having hard time understanding how.
Please help me who are experienced in RL and gym Thank you very much in advance