2

I want to train LightGBM with GPU on my dataset using Google colaboratory (I also select runtime Python3 and GPU). To do this I used follow chunk of code:

!apt-get -qq install --no-install-recommends nvidia-375
!apt-get -qq install --no-install-recommends nvidia-opencl-icd-375 nvidia-opencl-dev opencl-headers
#!apt-get update
!apt-get install --no-install-recommends git cmake build-essential libboost-dev libboost-system-dev libboost-filesystem-dev ocl-icd-libopencl1 ocl-icd-opencl-dev
!pip install -qq lightgbm --install-option=--gpu 

Also in notebook I selected device gpu:

clf = LGBMClassifier(
        n_estimators=10000,
        learning_rate=0.03,
        num_leaves=30,
        colsample_bytree=.8,
        subsample=.9,
        max_depth=7,
        reg_alpha=.1,
        reg_lambda=.1,
        min_split_gain=.01,
        min_child_weight=2,
        silent=-1,
        verbose=-1,
        device = 'gpu'
        #gpu_platform_id: '0'
        #gpu_device_id: '0'
        )

And got this:

LightGBMError                             Traceback (most recent call last)
<ipython-input-10-936c00d106e3> in <module>()
     50     clf.fit(trn_x, trn_y, 
     51             eval_set= [(trn_x, trn_y), (val_x, val_y)],
---> 52             eval_metric='auc', verbose=100, early_stopping_rounds=100  #30
     53            )
     54 

/usr/local/lib/python3.6/dist-packages/lightgbm/sklearn.py in fit(self, X, y, sample_weight, init_score, eval_set, eval_names, eval_sample_weight, eval_class_weight, eval_init_score, eval_metric, early_stopping_rounds, verbose, feature_name, categorical_feature, callbacks)
    673                                         verbose=verbose, feature_name=feature_name,
    674                                         categorical_feature=categorical_feature,
--> 675                                         callbacks=callbacks)
    676         return self
    677 

/usr/local/lib/python3.6/dist-packages/lightgbm/sklearn.py in fit(self, X, y, sample_weight, init_score, group, eval_set, eval_names, eval_sample_weight, eval_class_weight, eval_init_score, eval_group, eval_metric, early_stopping_rounds, verbose, feature_name, categorical_feature, callbacks)
    467                               verbose_eval=verbose, feature_name=feature_name,
    468                               categorical_feature=categorical_feature,
--> 469                               callbacks=callbacks)
    470 
    471         if evals_result:

/usr/local/lib/python3.6/dist-packages/lightgbm/engine.py in train(params, train_set, num_boost_round, valid_sets, valid_names, fobj, feval, init_model, feature_name, categorical_feature, early_stopping_rounds, evals_result, verbose_eval, learning_rates, keep_training_booster, callbacks)
    178     # construct booster
    179     try:
--> 180         booster = Booster(params=params, train_set=train_set)
    181         if is_valid_contain_train:
    182             booster.set_train_data_name(train_data_name)

/usr/local/lib/python3.6/dist-packages/lightgbm/basic.py in __init__(self, params, train_set, model_file, silent)
   1303                 train_set.construct().handle,
   1304                 c_str(params_str),
-> 1305                 ctypes.byref(self.handle)))
   1306             # save reference to data
   1307             self.train_set = train_set

/usr/local/lib/python3.6/dist-packages/lightgbm/basic.py in _safe_call(ret)
     46     """
     47     if ret != 0:
---> 48         raise LightGBMError(_LIB.LGBM_GetLastError())
     49 
     50 

LightGBMError: b'No OpenCL device found'

I also tried this solution Installing GPU support for LightGBM on Google Collab but nothing changed

Dmitriy Kisil
  • 2,858
  • 2
  • 16
  • 35

2 Answers2

1

I followed a suggestion from https://github.com/microsoft/LightGBM/issues/586, however it did not solved my problem. It turned out that the path to libnvidia-opencl.so was not aware by LightGBM library. So I modified the path to libnvidia-opencl.so.1, in my case, into /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1. Then it worked.

A one-line solution is:

mkdir -p /etc/OpenCL/vendors && \ echo "/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd

Of course you have to make sure Nvidia driver is properly installed. For Ubuntu 18.04, you can follow this instruction https://www.linuxbabe.com/ubuntu/install-nvidia-driver-ubuntu-18-04

pateheo
  • 430
  • 1
  • 5
  • 13
0

I got the same error when running another code. I had it solved it by disabling MIG and rebooting a machine.

sudo nvidia-smi -mig 0
sudo reboot
infal
  • 47
  • 4