Managing time limit in Deep Q-learning

Question

I'm trying to implement a python's Deep RL program, where the agent has to resolve the problem (approach a target) before the expiry of the time limit. Which is the best way to manage the time? It's a good idea to pass the remaining time as an input of the neural network? I tried to do that (remaining time as one of the entries describing the state of the environment) but the algorithm is not converging...

Any idea or tip? Thanks a lot!!

score 0 · Answer 1 · edited Apr 02 '20 at 12:14

0

Assuming you are trying to implement deep q learning, I think it's better to subtract the time remaining from the reward, like:

Q_target = (reward-time_remaining)+gamma*max(Q(s',a))

edited Apr 02 '20 at 12:14

Dan

59,490
13
101
110

answered Apr 02 '20 at 04:42

Joel Joseph

31
6

Managing time limit in Deep Q-learning

1 Answers1