Q-learning algorithm rewards generation

Asked Feb 17 '20 at 23:14

Active Feb 17 '20 at 23:14

Viewed 49 times

I am studying the Q-learning algorithm ( this is the tutorial I am following: https://blog.floydhub.com/an-introduction-to-q-learning-reinforcement-learning/ ). Basically, we have some set of states ( and some walls between them ) and we need to be able to find an optional path between any two states. In the rewards matrix M, M[i, j] = 1 <=> there is a direct path between i and j and there are no walls between them, otherwise it is 0. My question is, given some labyrinth ( set of states, as shown in the link ), how to generate the rewards matrix, instead of doing it manually, as shown in the tutorial? Thanks in advance :)

asked Feb 17 '20 at 23:14

Petur Ulev

how the labyrinth is represented? in fact it has to be represented by a graph data structure which is straightforwardly converted into the rewards matrix in O(n^2) time – mangusta Feb 18 '20 at 01:51
Not necessarily. It can be represented just as 2d array of numbers ( as I mentioned ), as well as the rewards matrix. The thing is that the dimensions of the reward matrix will be bigger. More precisely, if labyrinth L is NxM 2d array, then rewards matrix is (N * M)x( N * M) 2d array. That's it. – Petur Ulev Feb 18 '20 at 09:00
Oh, sorry, I did not mention that, hahaha :D – Petur Ulev Feb 18 '20 at 09:01
How could labyrinth be ever represented in the way other than the graph? NxM matrix cannot represent any labyrinth – mangusta Feb 18 '20 at 14:45

Q-learning algorithm rewards generation

0 Answers0