1

I have a python pandas program. When I run on AWS Ec2 with 8 cores. I got 100% CPU utilization and program finished in 8 minutes. Then if I go 16 cores, I can only get 50% CPU utilization and program still finish in 8 minutes. I suspect it hits memory bandwidth bottle neck so I selected X1,16xlarge instance which has 64 cores and claimed memory bandwidth of 300GB/s. However, it didn't help. the program used a tiny percentage of the 64 cores and didn't finished even after 10 minutes. ec2 X1 instance

Any idea what's going on?

(by the way, the same program running on my old desktop tower with 4 cores finished in 16 minutes)

more details: the program uses GridSearchCV which then uses joblib to run multi-processing in parallel. Number of process always equals to the number of cores in the system.

frank
  • 11
  • 2

1 Answers1

1

Python has a "global interpreter lock" that limits its ability to utilize multiple threads efficiently. One responded on this site that you should consider writing time critical functions in C/C++.

If possible, consider breaking your analysis into smaller chunks and then run them as separate processes to avoid the global interpreter lock issue.

nutcase
  • 80
  • 7
  • This is a great point but I am not sure it was what happened to me. Let me add more details to the original question. Thanks. – frank Mar 22 '18 at 23:57
  • I checked every process that joblib starts. There is only one thread attached. – frank Mar 23 '18 at 00:03