0

I create several workers and a learner using ray. How can I ensure that each actor runs on a CPU without contention?

Maybe
  • 2,129
  • 5
  • 25
  • 45
  • Do you have a particular reason to do this? Generally, letting the operating system's scheduler do its thing works pretty well; pinning is often counterproductive to performance. – Charles Duffy Sep 11 '19 at 23:30
  • Hi, @CharlesDuffy. Because I don't want to cause contention between actors – Maybe Sep 11 '19 at 23:34
  • Are you sure contention *really is* a problem? Again, the OS scheduler is good at what it does; only one process can fit on a core at a time, and if you have more cores than you have processes, then they're going to be fine; contrariwise, if you have more immediately-schedulable processes than cores, there's nothing you can do to *prevent* contention. – Charles Duffy Sep 11 '19 at 23:43
  • ...granted, there's a cost to moving a process between cores if they don't share the same cache, but again, the OS scheduler *knows about that cost*, and pays it only when it makes sense to do so. I'm speaking here as someone with a real-world extremely-parallel workload and real-world extremely-parallel hardware, where when we implemented CPU pinning it made things much worse (wasting resources when a process wasn't schedulable, thus reducing throughput). – Charles Duffy Sep 11 '19 at 23:44
  • ...so, in my real-world experience, you're better off adjusting priorities ("niceness"), scheduler algorithms, &c. and letting the scheduler do its thing. – Charles Duffy Sep 11 '19 at 23:47
  • (That said, if you still want to implement this, I've given you a hint above -- "CPU pinning" is the keyword/phrase you want to search for). – Charles Duffy Sep 11 '19 at 23:55
  • Thank you and sorry for the late response, @CharlesDuffy. I really appreciate you helping me understand these system-level details. I want to do this because many distributed reinforcement learning papers stress the importance that the number of workers must be no more than the number of CPU cores so as to avoid contention. But based on my understanding of your comments, it seems that I do not have to deliberately assign a core to each worker. If so, how can I know if there is any potential attention going on? – Maybe Sep 12 '19 at 03:42
  • From a monitoring perspective, I'd suggest running `vmstat 1` and watching it while your process is active -- if you look at the column on the far left, what it shows is the number of processes currently eligible to be scheduled to a CPU (meaning they aren't blocked on I/O or anything else that makes them not currently ready for a CPU). If that number is higher than the number of physical CPU cores in your machine, you have contention. – Charles Duffy Sep 12 '19 at 11:57
  • Thank you so much, and sorry for the late thanks @CharlesDuffy:-) You really help me understand what's going on here. – Maybe Sep 16 '19 at 14:15

0 Answers0