1

I'm working on a q-learning project that involves a circle solving a maze, and these is a problem with how I update the Q values but I'm not sure where: I have legit spent 3 days on the subject now and I am at my wits end.

Upon closer inspection it seems that every set of dictionaries in every row of Q is the same (eg : the value of [Direction.up] on row 3 is always 22, even if that shouldn't be the case)

Any pointers are welcome, here is the code in question, hopefully enough so you can test it yourselves:

rows=cols=10
for i in range(rows):
    Q.append([{}]*(cols))
    for x in range (cols):
        for y in range (rows):
            Q[x][y][Direction.up]=0
            Q[x][y][Direction.down]=0
            Q[x][y][Direction.left]=0
            Q[x][y][Direction.right]=0
x=5
y=2
Q[x][y][Direction.right]=22
for x in range (cols):
    for y in range (rows):
        print(x," ",y)
        print(Q[x][y])
        print("\n")
    print("\n")
Jessica Chambers
  • 1,246
  • 5
  • 28
  • 56
  • 1
    there are missing classes and imports, so even though there is some 180-ish lines of code to plow through, there is no reproducible example. Now, 1 of the things going wrong is the way of thinking about `x` and `y`. `mymatrix[y, x]` is the way to access column `x` and row `y`. Although I have no idea how this impacts the rest of your code, pointing to the right rows and columns might help somewhere? – Uvar Feb 26 '18 at 14:34
  • @Uvar I managed to extract the problematic piece of code. I tried putting `[x,y]` but it gave me a "TypeError: list indices must be integers or slices, not tuple". Q is A 2D array with x rows and Y columns, when I want to print the contents of one element it works ok, but when I store something in there, it assigns it to the whole row – Jessica Chambers Feb 26 '18 at 15:08

2 Answers2

3

One major problem is the datastructure. I guess that you want to store one value per x, y and direction. But if you initialize your list of dictionaries with a multiplication

Q = [{}] * 10

you end up with a list of ten times the same dictionary, not ten different ones:

>>> Q = [{}] * 10
>>> Q
[{}, {}, {}, {}, {}, {}, {}, {}, {}, {}]
>>> Q[0]["k"] = "v"
>>> Q
[{'k': 'v'}, {'k': 'v'}, {'k': 'v'}, {'k': 'v'}, {'k': 'v'}, {'k': 'v'}, {'k': 'v'}, {'k': 'v'
}, {'k': 'v'}, {'k': 'v'}]

So either initialize the dictionary in a loop

>>> Q = [{} for _ in range(10)]
>>> Q
[{}, {}, {}, {}, {}, {}, {}, {}, {}, {}]
>>> Q[0]["k"] = "v"
>>> Q
[{'k': 'v'}, {}, {}, {}, {}, {}, {}, {}, {}, {}]

or use just one dictionary with the tuple (x, y, direction) as the key:

Q = {}
for x in range(rows):
    for y in range(cols):
        for dir in Direction:
            Q[(x, y, dir)] = 0
YSelf
  • 2,646
  • 1
  • 14
  • 19
1

So the actual problem is that you are creating a list of cols repetitions of the dictionary {}.

a = [{}]*3
b = [{} for _ in range(3)]
print(id(a[0]), id(a[1]), id(a[2])) # returns 3 times the same identity
print(id(b[0]), id(b[1]), id(b[2])) # returns 3 different identities

The problem is that the multiplication operator * is defined to work on objects. Thus, first the expression is evaluated, then the object is multiplied. In a comprehension, the expression is evaluated at every iteration.

The * operator has no idea that you have an expression inside of your object and/or want to copy any part of it. Thus, it generates references to the same object instead of creating new ones. This behaviour of the multiplication operator is a fundamental part of the language design and as such it will have to be us users of Python who will have to adapt to working with it.

Incidentally, the same happens for the rows and cols definition.

rows = cols = 10
print(id(rows), id(cols)) # identities match

However, as integers are unmutable, you will not end up with changing cols if you redefine rows

rows = [3]
print(rows, cols) #[3] 10

Were you to use mutable objects though, you would end up with similar behaviour as you see in your current list of dictionaries problem:

rows = cols = {}
rows.update({1: 'a'})
print(rows, cols) #{1: 'a'} {1: 'a'}

Now, where does that leave us in our quest to have your dictionaries update the way you want it to (I took the liberty of adapting some parts of code where I thought it to be redundant):

rows=cols=10
for i in range(rows):
    Q.append([{} for _ in range(cols)])
    for x in range(cols):
        Q[x][i][Direction.up]=0
        Q[x][i][Direction.down]=0
        Q[x][i][Direction.left]=0
        Q[x][i][Direction.right]=0
x=5
y=2
Q[x][y][Direction.right]=22
for x in range(cols):
    for y in range(rows):
        print(x, '  ', y, '\n', Q[x][y], '\n', sep='')
    print("\n")
Uvar
  • 3,372
  • 12
  • 25