1

I'm doing some hyper-parameter tuning, so speed is key. I've got a nice workstation with both an AMD Ryzen 9 5950x and an NVIDIA RTX3060ti 8GB.

Setup:

  • xgboost 1.5.1 using PyPi in an anaconda environment.
  • NVIDIA graphics driver 471.68
  • CUDA 11.0

When training a xgboost model using the scikit-learn API I pass the tree_method = gpu_hist parameter. And i notice that it is consistently outperformed by using the default tree_method = hist.

Somewhat surprisingly, even when I open multiple consoles (I work in spyder) and start an Optuna study in each of them, each using a different scikit-learn model until my CPU usage is at 100%. When I then compare the tree_method = gpu_hist with tree_method = hist, the tree_method = hist is still faster!

How is this possible? Do I have my drivers configured incorrectly?, is my dataset too small to enjoy a benefit from the tree_method = gpu_hist? (7000 samples, 50 features on a 3 class classification problem). Or is the RTX3060ti simply outclassed by the AMD Ryzen 9 5950x? Or none of the above?

Any help is highly appreciated :)

Edit @Ferdy: I carried out this little experiment:

    def fit_10_times(tree_method, X_train, y_train):
    times = []
    for i in range(10):
        model = XGBClassifier(tree_method = tree_method)
        start = time.time()
        model.fit(X_train, y_train)
        times.append(time.time()-start)
    return times

cpu_times = fit_10_times('hist', X_train, y_train)
gpu_times = fit_10_times('gpu_hist', X_train, y_train)

print(X_train.describe())
print('mean cpu training times: ', np.mean(cpu_times), 'standard deviation :',np.std(cpu_times))
print('all training times :', cpu_times)
print('----------------------------------')
print('mean gpu training times: ', np.mean(gpu_times), 'standard deviation :',np.std(gpu_times))
print('all training times :', gpu_times)

Which yielded this output:

mean cpu training times:  0.5646213531494141 standard deviation : 0.010005875058323703
all training times : [0.5690040588378906, 0.5500047206878662, 0.5700047016143799, 0.563004732131958, 0.5570034980773926, 0.5486617088317871, 0.5630037784576416, 0.5680046081542969, 0.57651686668396, 0.5810048580169678]
----------------------------------
mean gpu training times:  2.0273998022079467 standard deviation : 0.05105794761358874
all training times : [2.0265607833862305, 2.0070691108703613, 1.9900789260864258, 1.9856727123260498, 1.9925382137298584, 2.0021069049835205, 2.1197071075439453, 2.1220884323120117, 2.0516715049743652, 1.9765043258666992]

The peak in CPU usage refers to the CPU training runs, and the peak in GPU usage the GPU training runs. My task manager right after the experiment

XiB
  • 620
  • 6
  • 19

1 Answers1

3

7000 samples is too small to fill the GPU pipeline, your GPU is likely to be starving. We usually work with millions of samples when using GPU acceleration.

jiamingy
  • 56
  • 6