I have read the DQN thesis.
While reading the DQN paper, I found that randomly selecting and learning samples reduced divergence in RL using a non-linier function approximator.
If so, why is the learning of RL using a non-linier function approximator divergent when the input data are strongly correlated?