I am using the PPO algorithm - provided by ray - to train an RL agent to stabilize traffic. During the training process, I keep seeing ValueError('Observation outside expected value range', Box(500,) screenshot
However, I don't know which part of my script is causing this issue or if it is caused by flow at all ?