I'm currently thinking of doing TD(λ) for a DQN network. I know how to implement if it's a table (you update Q(s,a) and e(s,a) for all state and action pairs), but what happens when the Q value is now retrieved from a function approximator (neural network)? How would I update for all states as well as do the increments and decay for eligibility traces?
I've found 2 papers that might be related, but they don't really explain how to implement, rather showing the results only. PDF Link 1 PDF Link 2