1

I seem to consistently encounter counter-examples in different texts as to what states constitute having the Markov property.

It seems some presentations assume a MDP to be one in which the current state/observation relays absolutely all necessary environmental information to make an optimal decision.

Other presentations state only that the current state/observation have all necessary details from prior observed states to make the optimal decision (eg see: http://www.incompleteideas.net/book/ebook/node32.html).

The difference between these two definitions is vast since some people seem to state that card games such as poker lack the Markov property since we cannot know the cards our opponent is holding, and this incomplete information thus invalidates the Markov property.

The other definition from my understanding seems to suggest that card games with hidden state (such as hidden cards) are in fact Markov, so long as the agent is basing its decisions as if it had access to all of its own prior observations.

So which one does the Markov property refer to? Does it refer to having complete information about the environment to make the optimal decision, or rather does it accept incomplete information but rather simply refer to the current state/observation of the agent simply being based upon an optimal decision as if that state had access to all prior states of the agent? Ie: In the poker example, as long as the current state gives us all information that we have observed before, even if there is a lot of hidden variables, would this now satisfy the Markov property?

user4779
  • 645
  • 5
  • 14

0 Answers0