3

I have recently been looking into reinforcement learning. For this, I have been reading the famous book by Sutton, but there is something I do not fully understand yet.

For Monte-Carlo learning, we can choose between first-visit and every-visit algorithm, and it can be proved that both converges to the right solution asymptotically. But I guess that there are a difference between both (I understand the difference by definition, but I do not understand what are the drawbacks of each method). Should I in some cases use first-visit, and sometimes last visit ?

Thanks a lot, Djaz

Djazouli
  • 361
  • 1
  • 13

1 Answers1

0

From my personal experience I have noticed first visit monte carlo converges faster and for control problems gets the optimal policy in fewer iterations.

I'm not sure if there exists a mathematical analysis on the rate of convergence of the two, but they both will converge to the true mean due to the law of large numbers.

Fady
  • 87
  • 1
  • 10