I have read this page of standford - https://web.stanford.edu/group/pdplab/pdphandbook/handbookch10.html. I am not able to understand how TD learning is used in neural networks. I am trying to make a checkers AI which will use TD learning, similar to what they have implemented in backgammon. Please explain the working of TD Back-Propagation.
I have already referred this question - Neural Network and Temporal Difference Learning But I am not able to understand the accepted answer. Please explain with a different approach if possible.