1

We're compiling xgboost v0.7 from source on a vanilla Ubuntu docker image. This image is being ran on our EC2 instances in a time critical setting.

Recently we've tried the new EC2 c5 instance type, that is supposed to be Intel Skylake gen CPUs. Very strangely, the same docker image on the new C5s produces significantly worse results time-wise. 3X slower in the median.

Ideas on why that might be the case?


Still holds true when compiling xgboost with -march=skylake-avx512

Re'em
  • 230
  • 2
  • 11
  • The same is happening to us on Google Cloud with Xgboost 0.81, the latency of predictions has increased 3x. How did you ended up solving the issue please? – Marigold Jan 16 '19 at 08:50
  • We ended up reverting back to c4. I was guessing that at some point this would be resolved by some OS update. Back then I tested it on Alpine containers on an Ubuntu 14.04 host. We've changed stuff since, but I hadn't gotten around to checking again. – Re'em Jan 16 '19 at 19:08

2 Answers2

0

Just encountered this post, looks very much like what we're seeing.

https://aloiskraus.wordpress.com/2018/06/16/why-skylakex-cpus-are-sometimes-50-slower-how-intel-has-broken-existing-code/

Re'em
  • 230
  • 2
  • 11
0

We had a similar issue (3x worse latency) when migrating to Skylake gen CPUs on Google Cloud. However, it turned out the real problem was caused by using instances with high number of cores (32 cores). For some reason, XGBoost spawned 30 threads per each instance of XGBoost (even though predict should run in just single thread). More details are here https://github.com/dmlc/xgboost/issues/1345.

We fixed it by setting

model._Booster.set_param("nthread", 1)

just after loading the model.

Marigold
  • 1,619
  • 1
  • 15
  • 17