0

I have trained an RL agent in an environment similar to the Puckworld. Theres no puck though! The agent is in continuous space and wants to reach a fixed target. Each episode the agent is born at a random location and there is an added noise to each action to make learning less trivial. The reward is given every step as a scaled version of the distance to the target.

I want to plot the convergence of the neural network. The same problem in discrete space and using Q learning, I would plot the sum of all elements in Q matrix vs episode number. This gave me a good understanding of the performance of the network. How can i do the same for a neural network?

Plotting the reward collected in an episode vs episode number is not optimal here. I use PyTorch. Any help is appreciated

  • 1
    Why is the plot of rewards not optimal for this problem? – nsidn98 May 09 '20 at 14:38
  • @nsidn98 the agent starts at random points which might be far or not so far from the goal. The reward given on each step is equal to the - (scaled distance from the target) . So even if it reaches the target from a longer distance it might not have a lot of total rewards. But i think i just need to plot the loss function avg vs episode here. – Ravi Pradip May 11 '20 at 06:27

0 Answers0