I'm trying to implement a python's Deep RL program, where the agent has to resolve the problem (approach a target) before the expiry of the time limit. Which is the best way to manage the time? It's a good idea to pass the remaining time as an input of the neural network? I tried to do that (remaining time as one of the entries describing the state of the environment) but the algorithm is not converging...
Any idea or tip? Thanks a lot!!