Episode Length and train_batch_size compatibility with RLLib PPO

Question

I have created a custom single agent Gym environment which I am trying to train using a quite a simple action space and reward function.

self.action_space = spaces.MultiDiscrete([3, 3])

Each gym step is a single second in my custom simulator and there is a time limit (and max episode length) of 43200 seconds.

The episode ends either by the agent:

Hitting the max number of steps (time limit)
Getting injured in the environment to a degree where it cannot move.

In both cases a done = True is generated.

I have used the default PPO parameters from RLLib. In addition I am using custom callbacks which can be provided on request.

During training I have set a max number of iterations to 600 which won't result in many episodes (55) however this is easily changed.

The issue arises when the agent ends its episode prematurely e.g. 6000 steps in. As the train_batch_size is 4000 it has already completed 1 iteration. The episode restarts and then continues until 2000 steps where it hits the next train_batch point of 8000.

At this point I get this error:

(PPO pid=1126475) /home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.
(PPO pid=1126475)   return _methods._mean(a, axis=axis, dtype=dtype,
(PPO pid=1126475) /home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars
(PPO pid=1126475)   ret = ret.dtype.type(ret / rcount)
2022-09-12 13:12:45,453 ERROR trial_runner.py:958 -- Trial trial-db0c7_00000: Error processing event.
Traceback (most recent call last):
  File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/tune/trial_runner.py", line 924, in _process_trial
    results = self.trial_executor.fetch_result(trial)
  File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/tune/ray_trial_executor.py", line 787, in fetch_result
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/worker.py", line 1713, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::PPO.train() (pid=1126475, ip=192.168.32.11, repr=PPO)
  File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/tune/trainable.py", line 314, in train
    result = self.step()
  File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 885, in step
    raise e
  File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 867, in step
    result = self.step_attempt()
  File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 920, in step_attempt
    step_results = next(self.train_exec_impl)
  File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/util/iter.py", line 756, in __next__
    return next(self.built_iterator)
  File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/util/iter.py", line 791, in apply_foreach
    result = fn(item)
  File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/rllib/execution/metric_ops.py", line 95, in __call__
    res = summarize_episodes(episodes, orig_episodes)
  File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/rllib/evaluation/metrics.py", line 197, in summarize_episodes
    custom_metrics[k + "_min"] = np.min(filt)
  File "<__array_function__ internals>", line 180, in amin
  File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 2918, in amin
    return _wrapreduction(a, np.minimum, 'min', axis, None, out,
  File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity
2022-09-12 13:12:45,460 INFO trial_runner.py:1240 -- Trial trial-db0c7_00000: Attempting to restore trial state from last checkpoint.

batch_mode is truncated_episodes

I am not sure if this is a hyperparameter problem or a result of something else in Ray?

Any help would be appreciated.

Episode Length and train_batch_size compatibility with RLLib PPO

0 Answers0