-2

Why does the position and newposition give the same output and update together in the next loop?

for game in range(nr_of_games):
    # Initialize the player at the start position and store the current position in position
    position=np.array([0,19])

    status = -1
    # loop over steps taken by the player
    while status == -1: #the status of the game is -1, terminate if 1 (see status_list above)

        # Find out what move to make using  
        q_in=Q[position[0],position[1]]

        
        move, action = action_fcn(q_in,epsilon,wind)
        
        # update location, check grid,reward_list, and status_list 
        
        newposition[0] = position[0] + move[0]
        newposition[1] = position[1] + move[1]
        
        print('new loop')
        print(newposition)
        print(position)
        
        
        grid_state = grid[newposition[0]][newposition[1]]
        reward = reward_list[grid_state]
        
        status = status_list[grid_state]
        status = int(status)
        
        if status == 1:
            Q[position[0],position[1],action]= reward
            break #Game over 
            
        else: Q[position[0],position[1],action]= (1-alpha)*Q[position[0],position[1],action]+alpha*(reward+gamma*Q[newposition[0],newposition[1],action])
           
        position = newposition

print out:

new loop
[16 26]
[16 26]
new loop
[17 26]
[17 26]
new loop
[18 26]
[18 26]
new loop
[19 26]
[19 26]
new loop
[19 25]
[19 25]
new loop
[20 25]
[20 25]
Hovercraft Full Of Eels
  • 283,665
  • 25
  • 256
  • 373
GAUSS
  • 3
  • 6
  • Where is `newposition` defined ? – keepAlive May 18 '21 at 18:39
  • If you actually have `newposition = position` in there, then I would expect this. In that case, both names are bound to the SAME array object. Changing one changes the other. I'm not sure why you're using `numpy` here, but seeing as you are, you can just write `newposition = position + move` and avoid this. – Tim Roberts May 18 '21 at 18:42
  • `newposition[0] = position[0] + move[0]` would fail unless `newposition` already exists. That's what we were asking. – Tim Roberts May 18 '21 at 18:42
  • You can only subscript pre-existing objects. – keepAlive May 18 '21 at 18:43
  • 1
    Please do not deface your question, especially after someone has put in the effort to answer it. This is not a polite or proper thing to do. – Hovercraft Full Of Eels May 18 '21 at 19:49
  • Even if you delete things your instructors can still see it in cached pages. Best to avoid posting "sensitive" information in the first place. Lesson learned – Hovercraft Full Of Eels May 19 '21 at 20:51

2 Answers2

1

Apparently, somewhere you do not show us, you do

>>> newposition = position

So actually, when you increment newposition, you actually are doing it over position as well.

So just make newposition be something different than position. I mean, make them have id(newposition) != id(position) and you will be good. Because currently, I guess that these two ids are the same, aren't they ?

Why does the position and newposition give the same output and update together in the next loop?

Because they are the same object. I am not (only) saying that they are equal, I am saying that newposition is position, i.e. you currently have (newposition is position) is True.

Just define newposition independently from position. For example:

# [...]
for game in range(nr_of_games):
    # Initialize the player at the start position and store the current position in position
    position    = np.array([0,19])
    newposition = np.empty((2,))
    # [...]

Also, you may have good reasons to do so, but keep in mind that if move and position have the same shape and convey the "same information", you could also just do

# [...]
    # [...]
        # [...]
        # newposition[0] = position[0] + move[0]
        # newposition[1] = position[1] + move[1]
        newposition = position + move
        # [...]

and remove newposition = np.empty((2,)).

keepAlive
  • 6,369
  • 5
  • 24
  • 39
0

that is because you trying to copy one list to another list with = operator; used with lists it assigns the pointer stored in right variable to the left variable, so physically the point to the same memory cells.

To copy a list truly, use the list.copy() method.

Tony Suffolk 66
  • 9,358
  • 3
  • 30
  • 33
  • 1
    The same happens with all mutable containers (not just lists), sets, dicts, and any other container can trip you up. Actually the behaviour of `=` is exactly the same on all objects, but when the object is immutable the fact you don't get a copy isn't a problem. Beginners expect `=` to give them a copy and it doesn't. Also note that if the list is a list of lists then list.copy() wont solve the problem - for nested containers you might want the deepcopy() of the container. – Tony Suffolk 66 May 18 '21 at 22:13