deep q learning: why use the same net for the target net and predict net can result in instability?

Asked May 11 '20 at 20:35

Active May 11 '20 at 20:35

Viewed 37 times

For deep q learning I can kind of imagine the neural net as the q table for normal q learning. So if for the q learning the q table is updated simultaneously, why cannot we use the same net for target q net and predict q net? I searched on google and someone said cause it's kind of like the net is chasing it's own tail, so it becomes unstable. That's kind of hard to understand, how does it become unstable? I mean, for normal q learning the q table works the same way, but it's stable.

I am confused.

asked May 11 '20 at 20:35

J.R.

1

I find exlpanation on this page quite reasonable: https://medium.com/@awjuliani/simple-reinforcement-learning-with-tensorflow-part-4-deep-q-networks-and-beyond-8438a3e2b8df – Poe Dator May 11 '20 at 20:50
1

@RuslanS. thank you! – J.R. May 12 '20 at 09:20
I read the article, so basically looks deepmind did a research and it shown it did work more stable this way: https://arxiv.org/pdf/1509.02971.pdf – J.R. May 12 '20 at 09:31

deep q learning: why use the same net for the target net and predict net can result in instability?

0 Answers0