-1

I'm trying to implement a python's Deep RL program, where the agent has to resolve the problem (approach a target) before the expiry of the time limit. Which is the best way to manage the time? It's a good idea to pass the remaining time as an input of the neural network? I tried to do that (remaining time as one of the entries describing the state of the environment) but the algorithm is not converging...

Any idea or tip? Thanks a lot!!

Felipe
  • 19
  • 1

1 Answers1

0

Assuming you are trying to implement deep q learning, I think it's better to subtract the time remaining from the reward, like:

Q_target = (reward-time_remaining)+gamma*max(Q(s',a))
Dan
  • 59,490
  • 13
  • 101
  • 110