I have a python pandas program. When I run on AWS Ec2 with 8 cores. I got 100% CPU utilization and program finished in 8 minutes. Then if I go 16 cores, I can only get 50% CPU utilization and program still finish in 8 minutes. I suspect it hits memory bandwidth bottle neck so I selected X1,16xlarge instance which has 64 cores and claimed memory bandwidth of 300GB/s. However, it didn't help. the program used a tiny percentage of the 64 cores and didn't finished even after 10 minutes. ec2 X1 instance
Any idea what's going on?
(by the way, the same program running on my old desktop tower with 4 cores finished in 16 minutes)
more details: the program uses GridSearchCV which then uses joblib to run multi-processing in parallel. Number of process always equals to the number of cores in the system.