1

I have written a Q-learning agent that plays tic-tac-tie against a random player. I want to play the game 20 times and plot a single mean learning curve using matplotlib. The first for loop plays the game twenty times and produces a list of numpy.ndarrays. How do I get the mean rewards over a series of episodes, that I can plot as a single curve? Here is what I have done so far:

lines = []

#play tictactoe 20 times
for i in range(0, 20):

    # Instantiate environment        
    environment = tictactoe.Tictactoe(verbose=True)

    # play the game which returns the rewards gained in a number of episodes
    line = play_tictactoe(environment,
                               player_o=player_o,
                               player_x=player_x,
                               episodes=m)
    #line is a numpy.ndarray
    #for example, the iteration of line could be 1,2,3,4,. The second 
    #could be 4,5,6,7. 
    lines.append(line)

for j in lines:
    avg_line = #calculate the single mean learning curve
               # I would want to plot 2.5, 3.5, 4.5, 5.5


ax2.plot(avg_line, color="red", label="Q-Agent")
ax2.set_title("The mean performance of 20 Q-Learning Agents")
ax2.set_xlabel('Episodes')
ax2.set_ylabel('Rewards')
plt.legend()
plt.show()
Rob
  • 73
  • 7

2 Answers2

1

You can compute the mean of each line and store the output in a list using a list comprehension and then plot the average line

avg_line = [np.mean(j) for j in lines] # This is called list comprehension

x = np.arange(0, len(avg_line))
fig, (ax2) = plt.subplots(1,1)

ax2.plot(x, avg_line, color="red", label="Q-Agent")
Sheldore
  • 37,862
  • 7
  • 57
  • 71
  • I tried what you suggested and I got this error message ValueError: x and y must have same first dimension, but have shapes (10,) and (20,). What does this mean? – Rob Jun 17 '20 at 14:55
  • @Rob : Try my new code. I now used `len(avg_line)` instead of `len(line)` – Sheldore Jun 17 '20 at 15:15
  • Hi, this isn't what I wanted. I wanted episodes on the x-axis, with rewards on the y-axis. For example, if agent1 got 1, 2, 3, 4 over 4 episodes and agent2 got 5, 6, 7, 8. I'd like to plot 3, 4, 5, 6. – Rob Jun 17 '20 at 16:38
  • @Rob Sorry, I can't help more until you provide a [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) – Sheldore Jun 17 '20 at 17:25
0

If I understand correctly, lines looks like this:

                |
                v   
[              t p
    [1,2,3,4], i l
    [5,6,7,8], m a
    [4,3,1,2], e y
    [5,6,7,8], s e
    [4,3,1,2],   d

]              
    -> episodes

and you want to plot the mean over the times_played axis, and plot it versus the index of the episode.

You can do this with

plt.plot(np.mean(lines, axis=0))
warped
  • 8,947
  • 3
  • 22
  • 49