can return False in env.step be True in someway? (gym)

Question

While I was trying to figure out the reset condition of flocking env(from gym-flock), I came up with this question: can 'return False' return True in someway??

The core codes are:

1: test_model.py in https://github.com/katetolstaya/multiagent_gnn_policies#available-algorithms

def test(args, actor_path, render=True):
# initialize gym env
env_name = args.get('env')
env = gym.make(env_name)
if isinstance(env.env, gym_flock.envs.FlockingRelativeEnv):
    env.env.params_from_cfg(args)

# use seed
seed = args.getint('seed')
env.seed(seed)
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)

# initialize params tuple
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
learner = DAGGER(device, args)
n_test_episodes = args.getint('n_test_episodes')
learner.load_model(actor_path, device)

**for _ in range(n_test_episodes):
    episode_reward = 0
    state = MultiAgentStateWithDelay(device, args, env.reset(), prev_state=None)
    done = False
    while not done:
        action = learner.select_action(state)
        next_state, reward, done, _ = env.step(action.cpu().numpy())
        next_state = MultiAgentStateWithDelay(device, args, next_state, prev_state=state)
        episode_reward += reward
        state = next_state
        if render:
            env.render()
    print(episode_reward)
env.close()**

2: gym environment code: flocking_relative.py in https://github.com/katetolstaya/gym-flock/tree/stable/gym_flock/envs/flocking

    def step(self, u):

    #u = np.reshape(u, (-1, 2))
    assert u.shape == (self.n_agents, self.nu)
    #u = np.clip(u, a_min=-self.max_accel, a_max=self.max_accel)
    self.u = u * self.action_scalar

    # x position
    self.x[:, 0] = self.x[:, 0] + self.x[:, 2] * self.dt + self.u[:, 0] * self.dt * self.dt * 0.5
    # y position
    self.x[:, 1] = self.x[:, 1] + self.x[:, 3] * self.dt + self.u[:, 1] * self.dt * self.dt * 0.5
    # x velocity
    self.x[:, 2] = self.x[:, 2] + self.u[:, 0] * self.dt
    # y velocity
    self.x[:, 3] = self.x[:, 3] + self.u[:, 1] * self.dt

    self.compute_helpers()

    return (self.state_values, self.state_network), self.instant_cost(), **False**, {}

For while loop in test_model.py to break and reset env, done should be True in some point. However, the code in env.step(code part 2) always return False in the place of done.

How does this loop break when env.step always return False? I have tested and confirmed that this code works fine, But having hard time understanding how.

Please help me who are experienced in RL and gym Thank you very much in advance

score 0 · Answer 1 · answered Jun 05 '21 at 06:50

https://github.com/katetolstaya/gym-flock/blob/stable/gym_flock/__init__.py#L65

in above file:

register(
    id='FlockingLeader-v0',
    entry_point='gym_flock.envs.flocking:FlockingLeaderEnv',
    max_episode_steps=200,
)

as # of steps become max_episode_steps, False in step returns True

can return False in env.step be True in someway? (gym)

1 Answers1