0

I implement multiagent ppo in rllib with a custom environment, it learns and works well except for the speed performance. I wonder if an underutilized CPU may cause the issue, so I want to know what ray/tune/perf/cpu_util_percent measures. Does it measure only the rollout workers, or is averaged over the learner? And what may be the cause? (All my runs give average of 13% CPU usage.)

run on gcp
ray 2.0
python3.9
torch1.12
head: n1-standard-8 with 1 v100 gpu
2 workers: c2-standard-60

num_workers: 120  # this worker != machine, num_workers = num_rollout_workers
num_envs_per_worker: 1
num_cpus_for_driver: 8
num_gpus: 1
num_cpus_per_worker: 1
num_gpus_per_worker: 0
train_batch_size: 12000
sgd_minibatch_size: 3000

I tried smaller batch size=4096 and smaller number of workers=10, and larger batch_size=480000, all resulted 10~20% CPU usage.

I cannot share the code.

  • *All my runs give average of 13% CPU usage.* Which could mean that your code is fully occupying one core on an eight-core CPU? – High Performance Mark Dec 22 '22 at 11:20
  • Thank you for the comment. Yes indeed, when I check rllib code, it does not use percpu=True so it is measuring system wise percentage. But since I'm using total 128 vCPUs across three machines, does this number still make sense? – Kuan-Ho Lao Dec 22 '22 at 15:38

0 Answers0