2

Is it normal to have the learn throughput decrease and the learn time increase as a Dueling DDQN agent gets trained?

A 7X increase in learn time after a few hours of training is pretty significant, is this something that you will expect to see? My system is only using 20% of the 8 core CPU and 25 GB out of 64 GB memory.

A ray.rllib.agents.dqn model is currently being trained on the CPU. The config are all defaults except

config['timesteps_per_iteration'] = 5000
config['noisy'] = True
config['compress_observations'] = True
config["num_workers"] = 4
config["num_envs_per_worker"] = 8
config["eager"] = True 

enter image description here

After further training, the learn throughput has plummeted even more to 20. CPU usage remains at around 20%, memory usage at 50 GB.

enter image description here

Using Ray 0.8.5, TensorFlow 2.2.0, Python 3.8.3, Ubuntu 18.04 inside WSL2

Nyxynyx
  • 61,411
  • 155
  • 482
  • 830
  • 1
    I think something important to consider is if your episodes get shorter or extended as the agent gets better. Learning time may increase because your episodes get longer. – ivallesp Nov 23 '20 at 09:41

0 Answers0