I'm working on a q-learning project that involves a circle solving a maze, and these is a problem with how I update the Q values but I'm not sure where: I have legit spent 3 days on the subject now and I am at my wits end.
Upon closer inspection it seems that every set of dictionaries in every row of Q is the same (eg : the value of [Direction.up] on row 3 is always 22, even if that shouldn't be the case)
Any pointers are welcome, here is the code in question, hopefully enough so you can test it yourselves:
rows=cols=10
for i in range(rows):
Q.append([{}]*(cols))
for x in range (cols):
for y in range (rows):
Q[x][y][Direction.up]=0
Q[x][y][Direction.down]=0
Q[x][y][Direction.left]=0
Q[x][y][Direction.right]=0
x=5
y=2
Q[x][y][Direction.right]=22
for x in range (cols):
for y in range (rows):
print(x," ",y)
print(Q[x][y])
print("\n")
print("\n")