xgboost treemethod gpu-hist outperformed by hist using rtx3060ti and amd ryzen 9 5950x

Question

I'm doing some hyper-parameter tuning, so speed is key. I've got a nice workstation with both an AMD Ryzen 9 5950x and an NVIDIA RTX3060ti 8GB.

Setup:

xgboost 1.5.1 using PyPi in an anaconda environment.
NVIDIA graphics driver 471.68
CUDA 11.0

When training a xgboost model using the scikit-learn API I pass the tree_method = gpu_hist parameter. And i notice that it is consistently outperformed by using the default tree_method = hist.

Somewhat surprisingly, even when I open multiple consoles (I work in spyder) and start an Optuna study in each of them, each using a different scikit-learn model until my CPU usage is at 100%. When I then compare the tree_method = gpu_hist with tree_method = hist, the tree_method = hist is still faster!

How is this possible? Do I have my drivers configured incorrectly?, is my dataset too small to enjoy a benefit from the tree_method = gpu_hist? (7000 samples, 50 features on a 3 class classification problem). Or is the RTX3060ti simply outclassed by the AMD Ryzen 9 5950x? Or none of the above?

Any help is highly appreciated :)

Edit @Ferdy: I carried out this little experiment:

    def fit_10_times(tree_method, X_train, y_train):
    times = []
    for i in range(10):
        model = XGBClassifier(tree_method = tree_method)
        start = time.time()
        model.fit(X_train, y_train)
        times.append(time.time()-start)
    return times

cpu_times = fit_10_times('hist', X_train, y_train)
gpu_times = fit_10_times('gpu_hist', X_train, y_train)

print(X_train.describe())
print('mean cpu training times: ', np.mean(cpu_times), 'standard deviation :',np.std(cpu_times))
print('all training times :', cpu_times)
print('----------------------------------')
print('mean gpu training times: ', np.mean(gpu_times), 'standard deviation :',np.std(gpu_times))
print('all training times :', gpu_times)

Which yielded this output:

mean cpu training times:  0.5646213531494141 standard deviation : 0.010005875058323703
all training times : [0.5690040588378906, 0.5500047206878662, 0.5700047016143799, 0.563004732131958, 0.5570034980773926, 0.5486617088317871, 0.5630037784576416, 0.5680046081542969, 0.57651686668396, 0.5810048580169678]
----------------------------------
mean gpu training times:  2.0273998022079467 standard deviation : 0.05105794761358874
all training times : [2.0265607833862305, 2.0070691108703613, 1.9900789260864258, 1.9856727123260498, 1.9925382137298584, 2.0021069049835205, 2.1197071075439453, 2.1220884323120117, 2.0516715049743652, 1.9765043258666992]

The peak in CPU usage refers to the CPU training runs, and the peak in GPU usage the GPU training runs.

I missed the notification about your question. I will check right now and edit my post with the result. — XiB, Dec 21 '21 at 09:40
discussed here: https://stackoverflow.com/a/67081695/3494126 — Ufos, Apr 05 '22 at 11:48

score 3 · Answer 1 · answered Jan 14 '22 at 05:37

3

7000 samples is too small to fill the GPU pipeline, your GPU is likely to be starving. We usually work with millions of samples when using GPU acceleration.

answered Jan 14 '22 at 05:37

jiamingy

56
6

That makes a lot of sense! Thank you! – XiB Jan 15 '22 at 09:56

xgboost treemethod gpu-hist outperformed by hist using rtx3060ti and amd ryzen 9 5950x

1 Answers1