I have recently been looking into reinforcement learning. For this, I have been reading the famous book by Sutton, but there is something I do not fully understand yet.
For Monte-Carlo learning, we can choose between first-visit and every-visit algorithm, and it can be proved that both converges to the right solution asymptotically. But I guess that there are a difference between both (I understand the difference by definition, but I do not understand what are the drawbacks of each method). Should I in some cases use first-visit, and sometimes last visit ?
Thanks a lot, Djaz