I have created a custom single agent Gym environment which I am trying to train using a quite a simple action space and reward function.
self.action_space = spaces.MultiDiscrete([3, 3])
Each gym step is a single second in my custom simulator and there is a time limit (and max episode length) of 43200 seconds.
The episode ends either by the agent:
- Hitting the max number of steps (time limit)
- Getting injured in the environment to a degree where it cannot move.
In both cases a done = True
is generated.
I have used the default PPO parameters from RLLib. In addition I am using custom callbacks which can be provided on request.
During training I have set a max number of iterations to 600 which won't result in many episodes (55) however this is easily changed.
The issue arises when the agent ends its episode prematurely e.g. 6000 steps in. As the train_batch_size is 4000 it has already completed 1 iteration. The episode restarts and then continues until 2000 steps where it hits the next train_batch point of 8000.
At this point I get this error:
(PPO pid=1126475) /home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.
(PPO pid=1126475) return _methods._mean(a, axis=axis, dtype=dtype,
(PPO pid=1126475) /home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars
(PPO pid=1126475) ret = ret.dtype.type(ret / rcount)
2022-09-12 13:12:45,453 ERROR trial_runner.py:958 -- Trial trial-db0c7_00000: Error processing event.
Traceback (most recent call last):
File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/tune/trial_runner.py", line 924, in _process_trial
results = self.trial_executor.fetch_result(trial)
File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/tune/ray_trial_executor.py", line 787, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/worker.py", line 1713, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::PPO.train() (pid=1126475, ip=192.168.32.11, repr=PPO)
File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/tune/trainable.py", line 314, in train
result = self.step()
File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 885, in step
raise e
File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 867, in step
result = self.step_attempt()
File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 920, in step_attempt
step_results = next(self.train_exec_impl)
File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/util/iter.py", line 756, in __next__
return next(self.built_iterator)
File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/util/iter.py", line 783, in apply_foreach
for item in it:
File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/util/iter.py", line 791, in apply_foreach
result = fn(item)
File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/rllib/execution/metric_ops.py", line 95, in __call__
res = summarize_episodes(episodes, orig_episodes)
File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/ray/rllib/evaluation/metrics.py", line 197, in summarize_episodes
custom_metrics[k + "_min"] = np.min(filt)
File "<__array_function__ internals>", line 180, in amin
File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 2918, in amin
return _wrapreduction(a, np.minimum, 'min', axis, None, out,
File "/home/dlopez-dlm/.cache/pypoetry/virtualenvs/ciaoplus-83eoOMxu-py3.9/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity
2022-09-12 13:12:45,460 INFO trial_runner.py:1240 -- Trial trial-db0c7_00000: Attempting to restore trial state from last checkpoint.
batch_mode
is truncated_episodes
I am not sure if this is a hyperparameter problem or a result of something else in Ray?
Any help would be appreciated.