In the Q-learning algorithm used in Reinforcement Learning with replay, one would use a data structure in which it stores previous experience that is used in training (a basic example would be a tuple in Python). For a complex state space, I would need to train the agent in a very large number of different situations to obtain a NN that correctly approximates the Q-values. The experience data will occupy more and more memory and thus I should impose a superior limit for the number of experience to be stored, after which the computer should drop the experience from memory.
Do you think FIFO (first in first out) would be a good way of manipulating the data vanishing procedure in the memory of the agent (that way, after reaching the memory limit I would discard the oldest experience, which may be useful for permitting the agent to adapt quicker to changes in the medium)? How could I compute a good maximum number of experiences in the memory to make sure that Q-learning on the agent's NN converges towards the Q function approximator I need (I know that this could be done empirically, I would like to know if an analytical estimator for this limit exists)?