1

This code :

R = ql.matrix([ [0,0,0,0,1,0],
        [0,0,0,1,0,1],
        [0,0,100,1,0,0],
        [0,1,1,0,1,0],
        [1,0,0,1,0,0],
        [0,1,0,0,0,0] ])

is from :

https://github.com/PacktPublishing/Artificial-Intelligence-By-Example/blob/47bed1a88db2c9577c492f950069f58353375cfe/Chapter01/MDP.py

R is defined as the "Reward matrix for each state" . What are the states and rewards in this matrix ?

# Reward for state 0
print('R[0,]:' , R[0,])

# Reward for state 0
print('R[1,]:' , R[1,])

prints :

R[0,]: [[0 0 0 0 1 0]]
R[1,]: [[0 0 0 1 0 1]]

Is [0 0 0 0 1 0] state0 & [0 0 0 1 0 1] state1 ?

blue-sky
  • 51,962
  • 152
  • 427
  • 752

1 Answers1

1

According to the book that uses that example, R represents the reward of the transitions from one current state s to another next state s'.

Specifically, R is associated with the following graph:

enter image description here

Each line in the matrix R represents a letter from A to F, and each column represents a letter from A to F. The 1 values represent the nodes of the graphs. I.e., R[0,]: [[0 0 0 0 1 0]] means that you can go from state s=A to next state s'=E and receive a reward of 1. Similarly, R[1,]: [[0 0 0 1 0 1]] means that you receive a reward of 1 if you go from B to F or D. The goal seems to be achieving and remaining in C, which obtains the largest reward.

Pablo EM
  • 6,190
  • 3
  • 29
  • 37
  • thanks for sharing, can you link to an online version of the book or is this from a book copy you own ? – blue-sky Feb 09 '20 at 22:26
  • 1
    Since this part of the book corresponds to the first chapter, I was using the "first chapter preview" in Amazon. Not very useful in general, but enough for answering your question. Sorry! – Pablo EM Feb 10 '20 at 08:02