0

I had a question related to reinforcement learning, why is the environment state markov? I read it somewhere it is by definition but I can't understand how the definition of environment state implies that is markov.

Abhishek Bhatia
  • 9,404
  • 26
  • 87
  • 142

1 Answers1

3

It isn't necessarily, but in general, reinforcement learning algorithms assume that you provide Markov states.

From chapter 3.5 of Reinforcement Learning: An Introduction:

What we would like, ideally, is a state signal that summarizes past sensations compactly, yet in such a way that all relevant information is retained. This normally requires more than the immediate sensations, but never more than the complete history of all past sensations. A state signal that succeeds in retaining all relevant information is said to be Markov, or to have the Markov property (we define this formally below).

Of course, it's unlikely that you'll ever be able to provide a perfect Markov state representation and actually learn.

The Markov property is important in reinforcement learning because decisions and values are assumed to be a function only of the current state. In order for these to be effective and informative, the state representation must be informative. This means that not all the theory strictly applies to cases in which the Markov property does not strictly apply. However, the theory developed for the Markov case still helps us to understand the behavior of the algorithms, and the algorithms can be successfully applied to many tasks with states that are not strictly Markov.

Nick Walker
  • 790
  • 6
  • 19
  • Thanks, can be elucidate from the environment's perspective. In the case of the environment, we have no control over defining the state, I suspect. – Abhishek Bhatia May 27 '16 at 23:59
  • I think I understand what you're asking about. You're right, we can't really change the way the environment *works*, but we can change how the agent sees it. So, we can't change the *state* but we can change the *state signal*, which is what the agent learns from. This signal can be any representation of the state, as much or as little information as the designer desires. This is discussed in 3.1 The Agent-environment Interface. Am I on the right track to addressing your question? – Nick Walker May 31 '16 at 03:13