I have written a Q-learning agent that plays tic-tac-tie against a random player. I want to play the game 20 times and plot a single mean learning curve using matplotlib. The first for loop plays the game twenty times and produces a list of numpy.ndarrays. How do I get the mean rewards over a series of episodes, that I can plot as a single curve? Here is what I have done so far:
lines = []
#play tictactoe 20 times
for i in range(0, 20):
# Instantiate environment
environment = tictactoe.Tictactoe(verbose=True)
# play the game which returns the rewards gained in a number of episodes
line = play_tictactoe(environment,
player_o=player_o,
player_x=player_x,
episodes=m)
#line is a numpy.ndarray
#for example, the iteration of line could be 1,2,3,4,. The second
#could be 4,5,6,7.
lines.append(line)
for j in lines:
avg_line = #calculate the single mean learning curve
# I would want to plot 2.5, 3.5, 4.5, 5.5
ax2.plot(avg_line, color="red", label="Q-Agent")
ax2.set_title("The mean performance of 20 Q-Learning Agents")
ax2.set_xlabel('Episodes')
ax2.set_ylabel('Rewards')
plt.legend()
plt.show()