Using matplotlib to plot mean learning curve of agents playing tictactoe

Question

I have written a Q-learning agent that plays tic-tac-tie against a random player. I want to play the game 20 times and plot a single mean learning curve using matplotlib. The first for loop plays the game twenty times and produces a list of numpy.ndarrays. How do I get the mean rewards over a series of episodes, that I can plot as a single curve? Here is what I have done so far:

lines = []

#play tictactoe 20 times
for i in range(0, 20):

    # Instantiate environment        
    environment = tictactoe.Tictactoe(verbose=True)

    # play the game which returns the rewards gained in a number of episodes
    line = play_tictactoe(environment,
                               player_o=player_o,
                               player_x=player_x,
                               episodes=m)
    #line is a numpy.ndarray
    #for example, the iteration of line could be 1,2,3,4,. The second 
    #could be 4,5,6,7. 
    lines.append(line)

for j in lines:
    avg_line = #calculate the single mean learning curve
               # I would want to plot 2.5, 3.5, 4.5, 5.5


ax2.plot(avg_line, color="red", label="Q-Agent")
ax2.set_title("The mean performance of 20 Q-Learning Agents")
ax2.set_xlabel('Episodes')
ax2.set_ylabel('Rewards')
plt.legend()
plt.show()

Sheldore · Answer 1 · 2020-06-17T15:15:04.420

1

You can compute the mean of each line and store the output in a list using a list comprehension and then plot the average line

avg_line = [np.mean(j) for j in lines] # This is called list comprehension

x = np.arange(0, len(avg_line))
fig, (ax2) = plt.subplots(1,1)

ax2.plot(x, avg_line, color="red", label="Q-Agent")

edited Jun 17 '20 at 15:15

answered Jun 17 '20 at 14:45

Sheldore

37,862
7
57
71

I tried what you suggested and I got this error message ValueError: x and y must have same first dimension, but have shapes (10,) and (20,). What does this mean? – Rob Jun 17 '20 at 14:55
@Rob : Try my new code. I now used `len(avg_line)` instead of `len(line)` – Sheldore Jun 17 '20 at 15:15
Hi, this isn't what I wanted. I wanted episodes on the x-axis, with rewards on the y-axis. For example, if agent1 got 1, 2, 3, 4 over 4 episodes and agent2 got 5, 6, 7, 8. I'd like to plot 3, 4, 5, 6. – Rob Jun 17 '20 at 16:38
@Rob Sorry, I can't help more until you provide a [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) – Sheldore Jun 17 '20 at 17:25

score 0 · Answer 2 · answered Jun 18 '20 at 06:46

If I understand correctly, lines looks like this:

                |
                v   
[              t p
    [1,2,3,4], i l
    [5,6,7,8], m a
    [4,3,1,2], e y
    [5,6,7,8], s e
    [4,3,1,2],   d

]              
    -> episodes

and you want to plot the mean over the times_played axis, and plot it versus the index of the episode.

You can do this with

plt.plot(np.mean(lines, axis=0))

Using matplotlib to plot mean learning curve of agents playing tictactoe

2 Answers2